1. Introduction and Aims

We have quality-controlled the 10X data and the SS2 data and now are left with the following objects:

10X 5K data - pb_sex_filtered

10X 30K data - pb_30k_sex_filtered

SS2 mutant data - ss2_mutants_final

2. Read in the data

Load/Install the Required Packages

[1] "patchwork is loaded correctly"

Attaching package: ‘viridis’

The following object is masked from ‘package:viridisLite’:

    viridis.map
[1] "viridis is loaded correctly"
[1] "Seurat is loaded correctly"
Loading required package: cowplot

Attaching package: ‘cowplot’

The following object is masked from ‘package:patchwork’:

    align_plots
[1] "cowplot is loaded correctly"
Loading required package: gridExtra
[1] "gridExtra is loaded correctly"
Loading required package: grid
[1] "grid is loaded correctly"
Loading required package: Hmisc
Loading required package: lattice
Loading required package: survival
Loading required package: Formula
Loading required package: ggplot2

Attaching package: ‘Hmisc’

The following object is masked from ‘package:Seurat’:

    Key

The following objects are masked from ‘package:base’:

    format.pval, units
[1] "Hmisc is loaded correctly"
Loading required package: reshape2
[1] "reshape2 is loaded correctly"
Loading required package: dplyr

Attaching package: ‘dplyr’

The following objects are masked from ‘package:Hmisc’:

    src, summarize

The following object is masked from ‘package:gridExtra’:

    combine

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union
[1] "dplyr is loaded correctly"
Loading required package: Nebulosa
[1] "Nebulosa is loaded correctly"
Loading required package: monocle3
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:dplyr’:

    combine, intersect, setdiff, union

The following object is masked from ‘package:gridExtra’:

    combine

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
    Reduce, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit, which, which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")',
    and for packages 'citation("pkgname")'.


Attaching package: ‘Biobase’

The following object is masked from ‘package:Hmisc’:

    contents

Loading required package: SingleCellExperiment
Loading required package: SummarizedExperiment
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:dplyr’:

    first, rename

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: IRanges

Attaching package: ‘IRanges’

The following objects are masked from ‘package:dplyr’:

    collapse, desc, slice

Loading required package: GenomeInfoDb
Loading required package: DelayedArray
Loading required package: matrixStats

Attaching package: ‘matrixStats’

The following objects are masked from ‘package:Biobase’:

    anyMissing, rowMedians

The following object is masked from ‘package:dplyr’:

    count


Attaching package: ‘DelayedArray’

The following objects are masked from ‘package:matrixStats’:

    colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges

The following objects are masked from ‘package:base’:

    aperm, apply, rowsum


Attaching package: ‘SummarizedExperiment’

The following object is masked from ‘package:Seurat’:

    Assays


Attaching package: ‘monocle3’

The following objects are masked from ‘package:Biobase’:

    exprs, fData, fData<-, pData, pData<-
[1] "monocle3 is loaded correctly"

Read in the Data

screen hits

## EDIT - change this to the excel table once we have it finalized for the screen
screen_hits <- c("PBANKA-0516300",
"PBANKA-1217700",
"PBANKA-0409100",
"PBANKA-1034300",
"PBANKA-1437500",
"PBANKA-0827500",
"PBANKA-0824300",
"PBANKA-1426900",
"PBANKA-0105300",
"PBANKA-0921100",
"PBANKA-1002400",
"PBANKA-0829400",
"PBANKA-1347200",
"PBANKA-0828000",
"PBANKA-0902300",
"PBANKA-1418100",
"PBANKA-1435200",
"PBANKA-1454800",
"PBANKA-0712300",
"PBANKA-0410500",
"PBANKA-1144800",
"PBANKA-1231600",
"PBANKA-0503200",
"PBANKA-0308900",
"PBANKA-1214700",
"PBANKA-0709900",
"PBANKA-0311900",
"PBANKA-0716500",
"PBANKA-1447900",
"PBANKA-0102200",
"PBANKA-0713500",
"PBANKA-0102400",
"PBANKA-1302700",
"PBANKA-1235900",
"PBANKA-0401100",
"PBANKA-0413400",
"PBANKA-1126900",
"PBANKA-1425900",
"PBANKA-0418300",
"PBANKA-1464600",
"PBANKA-0806000")

Read in gene annotations

gene_annotations <- read.table("../data/Reference/GenesByTaxon_Summary.csv", header = TRUE, sep = ",", stringsAsFactors = TRUE)
dim(gene_annotations)

## convert _ to -
gene_annotations$Gene.ID <- gsub("_", "-", gene_annotations$Gene.ID)

load in datasets

paste("10x dataset")
[1] "10x dataset"
pb_sex_filtered
An object of class Seurat 
5098 features across 6191 samples within 1 assay 
Active assay: RNA (5098 features, 2000 variable features)
 2 dimensional reductions calculated: pca, umap
paste("Smart-seq2 dataset")
[1] "Smart-seq2 dataset"
ss2_mutants_final
An object of class Seurat 
5245 features across 2717 samples within 1 assay 
Active assay: RNA (5245 features, 2000 variable features)
 2 dimensional reductions calculated: pca, umap
paste("The composition of the Smart-seq2 dataset is:")
[1] "The composition of the Smart-seq2 dataset is:"
table(ss2_mutants_final@meta.data$genotype)

Mutant     WT 
  2028    689 

3. Merging the Smart-seq2 and 10X Data

Prepare data

tenx_5k_counts_to_integrate
An object of class Seurat 
5098 features across 6191 samples within 1 assay 
Active assay: RNA (5098 features, 0 variable features)

We need to make sure the mutant data is compatible with the 10X data. the 10X data has fewer genes represented so we need to find the intersect of the two before integration.

GCSKO_mutants
An object of class Seurat 
5018 features across 2717 samples within 1 assay 
Active assay: RNA (5018 features, 0 variable features)
## double check that this is the same number of genes
## subset counts so that only genes represented in the other two objects are there:
length(intersect(rownames(tenx_5k_counts), rownames(mutant_counts_for_integration)))
[1] 5018

IMPORTANT - this next step is different to GCSKO_merge as it subsets the smart-seq2 data into wild-type only.

## subset wt on ss2 data:
ss2_wt_cells <- rownames(GCSKO_mutants@meta.data[GCSKO_mutants@meta.data$genotype == "WT", ])
GCSKO_mutants_wtonly <- subset(GCSKO_mutants, cells = ss2_wt_cells)

create list and normalise:

## make list
tenx.justwt.list <- list(tenx_5k_counts_to_integrate, GCSKO_mutants_wtonly)

## prepare data
for (i in 1:length(tenx.justwt.list)) {
    tenx.justwt.list[[i]] <- NormalizeData(tenx.justwt.list[[i]], verbose = FALSE)
    tenx.justwt.list[[i]] <- FindVariableFeatures(tenx.justwt.list[[i]], selection.method = "vst", 
        nfeatures = 2000, verbose = FALSE)
    all.genes <- rownames(tenx.justwt.list[[i]])
    tenx.justwt.list[[i]] <- ScaleData(tenx.justwt.list[[i]], features = all.genes)
}

Integrate objects

## Find anchors
tenx.justwt.anchors <- FindIntegrationAnchors(object.list = tenx.justwt.list, dims = 1:21, verbose = FALSE)

## Integrate data
tenx.justwt.integrated <- IntegrateData(anchorset = tenx.justwt.anchors, dims = 1:21, verbose = FALSE, features.to.integrate = genes_in_tenx_dataset)

4. Dimensionality reduction

PCA

Optimised UMAP

After optimisation, the following UMAP can be calculated:

## Run optimised UMAP
tenx.justwt.integrated <- RunUMAP(tenx.justwt.integrated, reduction = "pca", dims = 1:10, n.neighbors = 50, seed.use = 1234, min.dist = 0.4, repulsion.strength = 0.03, local.connectivity = 150)

Make final plots:


## split seurat object up
ob.list <- SplitObject(tenx.justwt.integrated, split.by = "experiment")

## make plots for each object
plot.list <- lapply(X = ob.list, FUN = function(x) {
    DimPlot(x, dims = c(1,2), pt.size = 1) + theme(legend.position="bottom")
})

composition_umap_10x <- plot.list$`tenx_5k` + 
  coord_fixed() +
  theme_void() +
  scale_color_manual(values=c(replicate(45, "#999999"))) +
  labs(title = paste("10x (wild-type)")) +
  theme(legend.position = "none", plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold"))

composition_umap_ss2 <- plot.list$mutants +
  coord_fixed() +
  theme_void() +
  scale_color_manual(values=c(replicate(46, "#999999"))) +
  labs(title = paste("Smart-seq2 (wild-type)")) +
  theme(legend.position = "none", plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold"))

composition_umap <- composition_umap_10x + composition_umap_ss2 

composition_umap

save

ggsave("../images_to_export/composition_umap.png", plot = composition_umap, device = "png", path = NULL, scale = 1, width = 20, height = 20, units = "cm", dpi = 300, limitsize = TRUE)
## make plots
## hoo dataset correlation
UMAP_hoo <- DimPlot(tenx.justwt.integrated, pt.size = 0.1, group.by = "Prediction.Spearman.", dims = c(1,2)) +
  coord_fixed() + 
  theme_void() +
  labs(title = paste("Hoo Predicted Timepoint")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) +
  scale_colour_manual(values = inferno(12))  +
  labs(colour = "hour") +
  theme(legend.position = "bottom", legend.title=element_text(size=10))

## ap2g timecourse in this paper correlation
UMAP_kasia <- DimPlot(tenx.justwt.integrated, pt.size = 0.1, group.by = "Prediction.Spearman._Kasia", dims = c(1,2)) +
  coord_fixed() + 
  theme_void() +
  labs(title = paste("AP2G Timecourse Predicted Timepoint")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) +
  scale_colour_manual(values = inferno(10))  +
  labs(colour = "hour") +
  theme(legend.position = "bottom", legend.title=element_text(size=10))

## combine
umap_bulk <- wrap_plots(UMAP_hoo, UMAP_kasia, ncol = 2)

## print
umap_bulk

MCA Mapping

Build the index

## load in mca data
counts <- read.csv("../data/Reference/scmap/allpb10x_counts.csv", row.names = 1)
pheno <- read.csv("../data/Reference/scmap/allpb10x_pheno.csv")

## change dash
rownames(counts) <- gsub("_", "-", rownames(counts))

## load MCA pal
mca_colors <- c("6"="#78C679",
            "2"="#D1EC9F",
            "0"="#FEB24C",
            "1"="#F4CF63",
            "3"="#FEEEAA",
            "4"="#85B1D3",
            "7"="#9ecae1",
            "5"="#C9E8F1",
            "M"= "#B7B7D8",
            "F"="#9C96C6",
            "unassigned"="darkgrey")

## plot
ggplot(pheno, aes(x=PC2_3d, y = PC3_3d, colour=absclust3)) + 
  geom_point() +
  theme_classic() +
  scale_color_manual(values = mca_colors, labels = (c("0 - early troph", "1 - mid troph","2 - late ring", "3 - late troph","4 - early schizont", "5 - late schizont", "6 - early ring","7 - mid schizont", "female", "male", "unassigned")))

###Making an ortholog reference index

## load required libraries
library(scmap)
Creating a generic function for ‘toJSON’ from package ‘jsonlite’ in package ‘googleVis’
library(SingleCellExperiment)
Loading required package: SummarizedExperiment
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply,
    parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname, do.call,
    duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map,
    mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
    Reduce, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor,
    see 'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: DelayedArray
Loading required package: matrixStats

Attaching package: ‘matrixStats’

The following objects are masked from ‘package:Biobase’:

    anyMissing, rowMedians


Attaching package: ‘DelayedArray’

The following objects are masked from ‘package:matrixStats’:

    colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges

The following objects are masked from ‘package:base’:

    aperm, apply, rowsum


Attaching package: ‘SummarizedExperiment’

The following object is masked from ‘package:Seurat’:

    Assays
#prep the SCE, if was originally a Suerat object need the dfs to be regular matrices
#pb_filtered_sce_orth <- pb_filtered_sce_orth[, colData(pb_filtered_sce_orth)$absclust3 != "8"]
#sce <- pb_filtered_sce_orth
#pca <- plotPCA(sce)
#pcs <- pca$data
#table(rownames(pcs)==colnames(sce))
#colData(sce) <- cbind(colData(sce), pcs)
#rowData(sce)$feature_symbol <- rowData(sce)$gene
sce <- SingleCellExperiment(list(counts=counts),
    colData=DataFrame(label=pheno),
    rowData=DataFrame(feature_symbol=rownames(counts)))
sce
class: SingleCellExperiment 
dim: 4890 4884 
metadata(0):
assays(1): counts
rownames(4890): PBANKA-0000301 PBANKA-0000600 ... PBANKA-MIT0350 PBANKA-MIT0360
rowData names(1): feature_symbol
colnames(4884): AAACCTGAGCACCGTC AAACCTGAGCGCTTAT ... AGCTTGACAATAGAGT CCGGGATCACGTTGGC
colData names(21): label.X label.sample_id ... label.PC2_3d label.PC3_3d
reducedDimNames(0):
altExpNames(0):
counts_1 <- assay(sce, "counts")
libsizes <- colSums(counts_1)
size.factors <- libsizes/mean(libsizes)
logcounts(sce) <- log2(t(t(counts_1)/size.factors) + 1)
#counts(sce) <- as.matrix(counts(sce))
#logcounts(sce) <- as.matrix(logcounts(sce))
# remove features with duplicated names
sce <- sce[!duplicated(rownames(sce)), ]

#build scmap-cell reference index, save this rds
sce <- selectFeatures(sce, suppress_plot = FALSE, n_features = 500)

table(rowData(sce)$scmap_features)

FALSE  TRUE 
 4390   500 
set.seed(1)
sce <- indexCell(sce)
Parameter M was not provided, will use M = n_features / 10 (if n_features <= 1000), where n_features is the number of selected features, and M = 100 otherwise.
Parameter k was not provided, will use k = sqrt(number_of_cells)
names(metadata(sce)$scmap_cell_index)
[1] "subcentroids" "subclusters" 
length(metadata(sce)$scmap_cell_index$subcentroids)
[1] 50
dim(metadata(sce)$scmap_cell_index$subcentroids[[1]])
[1] 10 69
metadata(sce)$scmap_cell_index$subcentroids[[1]][,1:5]
                         1           2          3           4           5
PBANKA-0100021 0.000000000 0.002174527 0.00000000 0.000000000 0.001882238
PBANKA-0100400 0.674777171 0.000000000 0.71839411 0.000000000 0.000000000
PBANKA-0100700 0.000000000 0.000000000 0.06346152 0.000000000 0.003406199
PBANKA-0100900 0.026157737 0.000000000 0.03578261 0.000000000 0.439794449
PBANKA-0102700 0.000000000 0.000000000 0.00000000 0.007797631 0.010978344
PBANKA-0103200 0.000000000 0.000000000 0.00000000 0.000000000 0.000000000
PBANKA-0105100 0.000000000 0.000000000 0.00000000 0.000000000 0.000000000
PBANKA-0107300 0.007199727 0.999997636 0.65054584 0.999969598 0.898011255
PBANKA-0108800 0.734420349 0.000000000 0.23174764 0.000000000 0.004578116
PBANKA-0109900 0.067575567 0.000000000 0.04105225 0.000000000 0.000000000
dim(metadata(sce)$scmap_cell_index$subclusters)
[1]   50 4884
#saveRDS(pb_filtered_sce_orth, file="pb_filtered_sce_orthindex_20181109.rds")

Map to the index

## convert seurat object above into sce for this
pf_field_counts <- as.matrix(tenx.justwt.integrated@assays$RNA@data)
pf_field_pheno <- tenx.justwt.integrated@meta.data

#prep the SCE, if was originally a Suerat object need the dfs to be regular matrices
#pb_filtered_sce_orth <- pb_filtered_sce_orth[, colData(pb_filtered_sce_orth)$absclust3 != "8"]
#sce <- pb_filtered_sce_orth
#pca <- plotPCA(sce)
#pcs <- pca$data
#table(rownames(pcs)==colnames(sce))
#colData(sce) <- cbind(colData(sce), pcs)
#rowData(sce)$feature_symbol <- rowData(sce)$gene
pf_live_field.sce.orth <- SingleCellExperiment(list(counts=pf_field_counts),
    colData=DataFrame(label=pf_field_pheno),
    rowData=DataFrame(feature_symbol=rownames(pf_field_counts)))
pf_live_field.sce.orth
class: SingleCellExperiment 
dim: 5098 6880 
metadata(0):
assays(1): counts
rownames(5098): PBANKA-0000101 PBANKA-0000301 ... PBANKA-MIT0360 PBANKA-MIT0370
rowData names(1): feature_symbol
colnames(6880): AAACCTGGTAAGGGCT AAACCTGGTGCACTTA ... SC25027_8_95 SC25027_8_96
colData names(126): label.orig.ident label.nCount_RNA ... label.pt_id_cols
  label.seurat_clusters_dot_plotting
reducedDimNames(0):
altExpNames(0):
counts_1 <- assay(pf_live_field.sce.orth, "counts")
libsizes <- colSums(counts_1)
size.factors <- libsizes/mean(libsizes)
logcounts(pf_live_field.sce.orth) <- log2(t(t(counts_1)/size.factors) + 1)


#Project query data set onto cell index
scmapCell_results <- scmapCell(
  pf_live_field.sce.orth, 
  list(mca = metadata(sce)$scmap_cell_index
  )
)

##Look into the results
# For each dataset there are two matricies. cells matrix contains the top 10 (scmap default) cell IDs of the cells of the reference dataset that a given cell of the projection dataset is closest to:
#   
#   Give assignments in two ways:
#   1. Take the top cell assignment abs clust, if cosine similarity is less than 0.4 (or adjust if needed) mark as unassigned
# 2. For the top 3 nearest neighbors, get a mean of the PCA coordinates and snap to the nearest cell of those coordinates. If any of the top three cells are sim below 0.4 then mark as unassigned.


##Top cell assignment method
scmapCell_results$mca$cells[, 1:3]
      AAACCTGGTAAGGGCT AAACCTGGTGCACTTA AAACGGGTCGTCCAGG
 [1,]             1591             2340             2220
 [2,]             2310             2185             2122
 [3,]             4699             2242             2350
 [4,]             2137             1591             2242
 [5,]             2121             2430             1653
 [6,]             2092             2108             2457
 [7,]             1091             1094             2175
 [8,]             4680             2090             2170
 [9,]             2226             4122             2365
[10,]             2136             2226             2226
getcells <- scmapCell_results$mca$cells[1, ]
cdsce <- colData(sce)[getcells, ]
topsim <- scmapCell_results$mca$similarities[1, ]

pf_live_field.sce.orth$topcell <- cdsce$label.sample_id
pf_live_field.sce.orth$topcell_ac <- cdsce$label.absclust3
pf_live_field.sce.orth$indexPC1 <- cdsce$label.PC1
pf_live_field.sce.orth$indexPC2 <- cdsce$label.PC2
#pf_live_field.sce.orth$pbpt <- cdsce$pseudotime
#pf_live_field.sce.orth$pbbulk <- cdsce$bulk
pf_live_field.sce.orth$topcell_sp <- pf_live_field.sce.orth$topcell_ac
pf_live_field.sce.orth$topsim <- topsim
pf_live_field.sce.orth$topcell_sp[pf_live_field.sce.orth$topsim < 0.3] <- "unassigned"
table(pf_live_field.sce.orth$topcell_sp)

   0    1    2    3    4    5    6    7    F    M 
1766  534 3478  123   73   71  103   37  353  342 

add these metrics back into the object

## add the meta data from scmap to the object
# live
## extract
df_plot_live <- data.frame(topcell_sp = colData(pf_live_field.sce.orth)$topcell_sp, topsim = colData(pf_live_field.sce.orth)$topsim, row.names = rownames(colData(pf_live_field.sce.orth)))
## add to object
tenx.justwt.integrated <- AddMetaData(
  object = tenx.justwt.integrated,
  metadata = df_plot_live,
  col.name = c('mca_topcell_sp', 'mca_topsim')
)

## need to fix the NA in topcell_sp
# Un-factorize (as.numeric can be use for numeric values)
#              (as.vector  can be use for objects - not tested)
tenx.justwt.integrated@meta.data$mca_topcell_sp <- as.character(tenx.justwt.integrated@meta.data$mca_topcell_sp)
## add unassigned
tenx.justwt.integrated@meta.data$mca_topcell_sp[is.na(tenx.justwt.integrated@meta.data$mca_topcell_sp)] <- "unassigned"
# Re-factorize with the as.factor function or simple factor(fixed$Type)
tenx.justwt.integrated@meta.data$mca_topcell_sp <- as.factor(tenx.justwt.integrated@meta.data$mca_topcell_sp)
## check
table(tenx.justwt.integrated@meta.data$mca_topcell_sp)

   0    1    2    3    4    5    6    7    F    M 
1766  534 3478  123   73   71  103   37  353  342 
## reorder levels
tenx.justwt.integrated@meta.data$mca_topcell_sp <- factor(tenx.justwt.integrated@meta.data$mca_topcell_sp, levels = rev(c("M", "F", "5", "7", "4", "3", "1", "0", "2", "6")))

UMAP_mca <- DimPlot(tenx.justwt.integrated, pt.size = 0.1, group.by = "mca_topcell_sp", dims = c(1,2)) +
  coord_fixed() + 
  theme_void() +
  labs(title = paste("MCA Predicted Cell Type"), colour = "MCA Cell Type") + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) +
  scale_color_manual(values = mca_colors, labels = (c("6 - early ring", "2 - late ring", "0 - early troph", "1 - mid troph", "3 - late troph", "4 - early schizont", "7 - mid schizont", "5 - late schizont", "female", "male"))) +
  theme(legend.position = "right", legend.title=element_text(size=15))

UMAP_mca

save

ggsave("../images_to_export/UMAP_mca.png", plot = UMAP_mca, device = "png", path = NULL, scale = 1, width = 15, height = 15, units = "cm", dpi = 300, limitsize = TRUE)

5. Clustering

Generate clusters

## generate new clusters at low resolution
tenx.justwt.integrated <- FindNeighbors(tenx.justwt.integrated, dims = 1:19, reduction = "pca")
tenx.justwt.integrated <- FindClusters(tenx.justwt.integrated, resolution = 2, random.seed = 42, algorithm = 2)

visualise clusters

DimPlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE, repel = FALSE, label = TRUE)

Make individual plots highlighting where cells in each cluster fall

plot

## this function writes the next bit of code for you
## put it into the console and paste the response
#ploty <- c()
#for(i in seq_along(levels(tenx.justwt.integrated@meta.data$seurat_clusters))){
#  ploty <- paste0(ploty, "list_UMAPs_by_cluster[[", i, "]]", " + ")
#}

## plot
list_UMAPs_by_cluster[[1]] + list_UMAPs_by_cluster[[2]] + list_UMAPs_by_cluster[[3]] + list_UMAPs_by_cluster[[4]] + list_UMAPs_by_cluster[[5]] + list_UMAPs_by_cluster[[6]] + list_UMAPs_by_cluster[[7]] + list_UMAPs_by_cluster[[8]] + list_UMAPs_by_cluster[[9]] + list_UMAPs_by_cluster[[10]] + list_UMAPs_by_cluster[[11]] + list_UMAPs_by_cluster[[12]] + list_UMAPs_by_cluster[[13]] + list_UMAPs_by_cluster[[14]] + list_UMAPs_by_cluster[[15]] + list_UMAPs_by_cluster[[16]] + list_UMAPs_by_cluster[[17]] + list_UMAPs_by_cluster[[18]] + list_UMAPs_by_cluster[[19]] + list_UMAPs_by_cluster[[20]] + list_UMAPs_by_cluster[[21]] + list_UMAPs_by_cluster[[22]] + list_UMAPs_by_cluster[[23]] + list_UMAPs_by_cluster[[24]]

Find Markers

markers_annotated <- merge(pb.markers, gene_annotations,  by.x = "gene", by.y = "Gene.ID", all = FALSE)
View(markers_annotated)
markers_annotated <- merge(pb.markers, gene_annotations,  by.x = "gene", by.y = "Gene.ID", all = FALSE)
View(markers_annotated)
markers_subset <- pb.markers %>% group_by(cluster) %>% top_n(n = 20, wt = avg_logFC)
markers_subset_annotated <- merge(markers_subset, gene_annotations,  by.x = "gene", by.y = "Gene.ID", all = FALSE)
markers_subset_annotated
View(markers_subset_annotated)
View(markers_annotated_mutant <- markers_annotated[which(markers_annotated$gene %in% list_of_mutant_genes), ])

visualise markers

## get the top 10 for each cluster
top10 <- pb.markers %>% group_by(cluster) %>% top_n(n = 10, wt = avg_logFC)

## scale data is missing - but original dfs have scale.data
# df_scaled <- as.matrix(transform(merge(as.data.frame(tenx.justwt.list[[1]]@assays$RNA@scale.data), as.data.frame(tenx.justwt.list[[2]]@assays$RNA@scale.data), by = 0, all=FALSE), row.names=Row.names, Row.names=NULL))
# tenx.justwt.integrated@assays$RNA@scale.data <- df_scaled

## scale data in RNA slot
all.genes <- rownames(tenx.justwt.integrated)
tenx.justwt.integrated <- ScaleData(tenx.justwt.integrated, vars.to.regress = 'experiment', features = all.genes)

## heatmap
DoHeatmap(tenx.justwt.integrated, features = top10$gene) + NoLegend()

6. Define Cluster Identities

Downsampling

# ## check number of cells in each cluster
# df_clusters <- as.data.frame(table(tenx.justwt.integrated@meta.data$integrated_snn_res.2))
# plot(df_clusters)
# 
# ## downsample using appropriate metrics
# tenx.justwt.integrated.downsampled <- subset(tenx.justwt.integrated, downsample = 100)
#   
# ## inspect plot
# FeaturePlot(tenx.justwt.integrated.downsampled, features = "PBANKA-1437500", coord.fixed = TRUE, dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
#   theme_void() + 
#   labs(title = paste("AP2G (Commitment)")) + 
#   theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
#   scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))

Marker Genes

useful tools for all plots

## define male and female symbol
female_symbol <- intToUtf8(9792)
male_symbol <- intToUtf8(9794)

## define colour pal
marker_gene_ramp <- colorRampPalette(c("#D3D3D3", "#1D1564"))(50)

Expression - cutoffs - purple


marker_gene_plot_CCP2 <- FeaturePlot(tenx.justwt.integrated, 
                                     features = "PBANKA-1319500", 
                                     coord.fixed = TRUE, 
                                     min.cutoff = "q05", 
                                     max.cutoff = "q95", 
                                     dims = c(1,2), 
                                     reduction = "umap", 
                                     pt.size = 0.5, 
                                     order = TRUE, 
                                     slot = 'scale.data') + 
                          coord_fixed() +
                          theme_void() + 
                          labs(title = expression(italic(ccp2)~(Female))) + 
                          theme(plot.title = element_text(hjust = 0.5), 
                                legend.text = element_text(size = 5), 
                                text=element_text(size=6.0), 
                                plot.margin = unit(c(0, 0, 0, 0), "cm")
                                ) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))

marker_gene_plot_MG1 <- FeaturePlot(tenx.justwt.integrated, 
                                    features = "PBANKA-0416100", 
                                    coord.fixed = TRUE, 
                                    min.cutoff = "q05", 
                                    max.cutoff = "q95", 
                                    dims = c(1,2), 
                                    reduction = "umap", 
                                    pt.size = 0.5, 
                                    order = TRUE, 
                                    slot = 'scale.data') +
                        coord_fixed() + 
                        theme_void() + 
                        labs(title = expression(italic(mg1)~(Male))) + 
                        theme(plot.title = element_text(hjust = 0.5), legend.text = element_text(size = 5), text=element_text(size=6.0), plot.margin = unit(c(0, 0, 0, 0), "cm")) + 
                        scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))

marker_gene_plot_AP2G <- FeaturePlot(tenx.justwt.integrated, 
                                     features = "PBANKA-1437500", 
                                     coord.fixed = TRUE, 
                                     min.cutoff = "q05", 
                                     max.cutoff = "q95", 
                                     dims = c(1,2), 
                                     reduction = "umap", 
                                     pt.size = 0.5, 
                                     order = TRUE, 
                                     slot = 'scale.data') + 
                          coord_fixed() +
                          theme_void() + 
                          labs(title = expression(italic(ap2g)~(Commitment))) + 
                          theme(plot.title = element_text(hjust = 0.5), legend.text = element_text(size = 5), text=element_text(size=6.0), plot.margin = unit(c(0, 0, 0, 0), "cm")) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))

marker_gene_plot_MSP1 <- FeaturePlot(tenx.justwt.integrated, 
                                     features = "PBANKA-0831000", 
                                     coord.fixed = TRUE, 
                                     min.cutoff = "q05", 
                                     max.cutoff = "q95", 
                                     dims = c(1,2), 
                                     reduction = "umap", 
                                     pt.size = 0.5, 
                                     order = TRUE, 
                                     slot = 'scale.data') + 
                          coord_fixed() +
                          theme_void() + 
                          labs(title = expression(italic(msp1)~(Schizont))) + 
                          theme(plot.title = element_text(hjust = 0.5), legend.text = element_text(size = 5), text=element_text(size=6.0), plot.margin = unit(c(0, 0, 0, 0), "cm")) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))

marker_gene_plot_MSP8 <- FeaturePlot(tenx.justwt.integrated, 
                                     features = "PBANKA-1102200", 
                                     coord.fixed = TRUE, 
                                     min.cutoff = "q05", 
                                     max.cutoff = "q95", 
                                     dims = c(1,2), 
                                     reduction = "umap", 
                                     pt.size = 0.5, 
                                     order = TRUE, 
                                     slot = 'scale.data') + 
                          coord_fixed() +
                          theme_void() + 
                          labs(title = expression(italic(msp8)~(Asexual))) + 
                          theme(plot.title = element_text(hjust = 0.5), legend.text = element_text(size = 5), text=element_text(size=6.0), plot.margin = unit(c(0, 0, 0, 0), "cm")) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))

marker_gene_plot_SBP1 <- FeaturePlot(tenx.justwt.integrated, 
                                     features = "PBANKA-1101300", 
                                     coord.fixed = TRUE, 
                                     min.cutoff = "q05", 
                                     max.cutoff = "q95", 
                                     dims = c(1,2), 
                                     reduction = "umap", 
                                     pt.size = 0.5, 
                                     order = TRUE,
                                     slot = 'scale.data') +
                          coord_fixed() + 
                          theme_void() + 
                          labs(title = expression(italic(SBP1)~(Ring))) + 
                          theme(plot.title = element_text(hjust = 0.5), legend.text = element_text(size = 5), text=element_text(size=6.0), plot.margin = unit(c(0, 0, 0, 0), "cm")) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))

marker_gene_plot_FAMB <- FeaturePlot(tenx.justwt.integrated, 
                                     features = "PBANKA-0722600", 
                                     coord.fixed = TRUE, 
                                     min.cutoff = "q05", 
                                     max.cutoff = "q95", 
                                     dims = c(1,2), 
                                     reduction = "umap", 
                                     pt.size = 0.5, 
                                     order = TRUE,
                                     slot = 'scale.data') +
                          coord_fixed() + 
                          theme_void() + 
                          labs(title = expression(italic(fam-b2)~(Ring))) + 
                          theme(plot.title = element_text(hjust = 0.5), legend.text = element_text(size = 5), text=element_text(size=6.0), plot.margin = unit(c(0, 0, 0, 0), "cm")) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))

marker_gene_plot_ORC1 <- FeaturePlot(tenx.justwt.integrated, 
                                     features = "PBANKA-0602000", 
                                     coord.fixed = TRUE, 
                                     min.cutoff = "q05", 
                                     max.cutoff = "q95", 
                                     dims = c(1,2), 
                                     reduction = "umap", 
                                     pt.size = 0.5, 
                                     order = TRUE,
                                     slot = 'scale.data') +
                          coord_fixed() + 
                          theme_void() + 
                          labs(title = expression(italic(orc1)~(DNA~replication))) + 
                          theme(plot.title = element_text(hjust = 0.5), legend.text = element_text(size = 5), text=element_text(size=6.0), plot.margin = unit(c(0, 0, 0, 0), "cm")) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))

marker_gene_plot_CDPK5 <- FeaturePlot(tenx.justwt.integrated, 
                                      features = "PBANKA-1351500", 
                                      coord.fixed = TRUE, 
                                      min.cutoff = "q05", 
                                      max.cutoff = "q95", 
                                      dims = c(1,2), 
                                      reduction = "umap", 
                                      pt.size = 0.5, 
                                      order = TRUE, 
                                      slot = 'scale.data') +
                          coord_fixed() +
                          theme_void() + 
                          labs(title = expression(italic(cdpk5)~(Schizont))) + 
                          theme(plot.title = element_text(hjust = 0.5), legend.text = element_text(size = 5), text=element_text(size=6.0), plot.margin = unit(c(0, 0, 0, 0), "cm")) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))

marker_gene_plot_MCM4 <- FeaturePlot(tenx.justwt.integrated, 
                                      features = "PBANKA-1415600", 
                                      coord.fixed = TRUE, 
                                      min.cutoff = "q05", 
                                      max.cutoff = "q95", 
                                      dims = c(1,2), 
                                      reduction = "umap", 
                                      pt.size = 0.5, 
                                      order = TRUE, 
                                      slot = 'scale.data') +
                          coord_fixed() +
                          theme_void() + 
                          labs(title = expression(italic(mcm4)~(DNA~replication))) + 
                          theme(plot.title = element_text(hjust = 0.5), legend.text = element_text(size = 5), text=element_text(size=6.0), plot.margin = unit(c(0, 0, 0, 0), "cm")) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))

# + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))

## plot
## cowplot method
marker_gene_plot_all <- plot_grid(marker_gene_plot_FAMB, marker_gene_plot_MSP8, marker_gene_plot_MSP1, marker_gene_plot_AP2G, marker_gene_plot_CCP2, marker_gene_plot_MG1, marker_gene_plot_HSP70, nrow=3)

marker_gene_plot_all

Mutant Genes

Expression - with cutoffs

ggsave("../images_to_export/ALL_CELLS_wt_gene_expression.png", plot = mutant_expression_composite, device = "png", path = NULL, scale = 1, width = 21, height = 29.7, units = "cm", dpi = 300, limitsize = TRUE)

save

ggsave("../images_to_export/ALL_CELLS_wt_gene_expression.png", plot = mutant_expression_composite, device = "png", path = NULL, scale = 1, width = 21, height = 29.7, units = "cm", dpi = 300, limitsize = TRUE)
# PBANKA-1418100        GCSKO-17  FD3   
# PBANKA-0102400         GCSKO-2  MD3 
# PBANKA-0716500        GCSKO-19  MD4 
# PBANKA-1435200        GCSKO-20  FD4 
# PBANKA-0902300        GCSKO-13  FD2
# PBANKA-0413400    GCSKO-10_820  MD5
# PBANKA-0828000         GCSKO-3  GD1
# PBANKA-1302700       GCSKO-oom  MD1 
# PBANKA-1447900        GCSKO-29  MD2
# PBANKA-1454800        GCSKO-21  FD1
# PBANKA-1144800        GCSKO-28  FD5


marker_gene_plot_17 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-1418100", coord.fixed = TRUE, min.cutoff = "q10", max.cutoff = "q90", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("fd3")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_2 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-0102400", coord.fixed = TRUE, min.cutoff = "q10", max.cutoff = "q90", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("md3")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_19 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-0716500", coord.fixed = TRUE, min.cutoff = "q10", max.cutoff = "q90", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("md4")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_20 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-1435200", coord.fixed = TRUE, min.cutoff = "q10", max.cutoff = "q90", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("fd4")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_13 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-0902300", coord.fixed = TRUE, min.cutoff = "q10", max.cutoff = "q90", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("fd2")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_10 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-0413400", coord.fixed = TRUE, min.cutoff = "q10", max.cutoff = "q90", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("md5")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_3 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-0828000", coord.fixed = TRUE, min.cutoff = "q10", max.cutoff = "q90", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("gd1")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_oom <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-1302700", coord.fixed = TRUE, min.cutoff = "q10", max.cutoff = "q90", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("md1")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_29 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-1447900", coord.fixed = TRUE, min.cutoff = "q10", max.cutoff = "q90", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("md2")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_21 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-1454800", coord.fixed = TRUE, min.cutoff = "q10", max.cutoff = "q90", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("fd1")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

##original label:
# labs(title = paste("(CCP2; Female)","\n", "PBANKA_1319500"))

## make composite plot
mutant_expression_composite <- wrap_plots(marker_gene_plot_17 , marker_gene_plot_2 , marker_gene_plot_19 , marker_gene_plot_20 , marker_gene_plot_13 , marker_gene_plot_10 , marker_gene_plot_3 , marker_gene_plot_oom , marker_gene_plot_29 , marker_gene_plot_21 , ncol = 4)
           
## print
mutant_expression_composite

7. Pseudotime on all cells

Pseudotime calculation

## extract data from Seurat
seurat.object.all <- tenx.justwt.integrated
# counts
data <- as(as.matrix(GetAssayData(seurat.object.all, assay = "integrated", slot = "data")), 'sparseMatrix')
# meta data
pd <- data.frame(seurat.object.all@meta.data)

## keep only the columns that are relevant
#pData <- pd %>% select(orig.ident, nCount_RNA, nFeature_RNA)
## add gene short name
fData <- data.frame(gene_short_name = row.names(data), row.names = row.names(data))

## Construct monocle cds
monocle.object.all <- new_cell_data_set(expression_data = data, cell_metadata = pd, gene_metadata = fData)
## preprocess
monocle.object.all = preprocess_cds(monocle.object.all, num_dim = 100, norm_method = "none")
## plot variance explained plot
#plot_pc_variance_explained(monocle.object.all)
## make monocle UMAP
#monocle.object.all = reduce_dimension(monocle.object.all, reduction_method = "UMAP", preprocess_method = "PCA", umap.metric = "euclidean", umap.n_neighbors = 20, umap.min_dist = 0.5, verbose = FALSE)
#plot_cells(monocle.object.all)

## add UMAP from Seurat
monocle.object.all@int_colData@listData$reducedDims@listData[["UMAP"]] <-seurat.object.all@reductions[["umap"]]@cell.embeddings 
plot_cells(monocle.object.all)

## cluster
monocle.object.all = cluster_cells(monocle.object.all)

## plot clusters
plot_cells(monocle.object.all, color_cells_by="partition", group_cells_by="partition",  x = 1, y = 2)

## reduce partitions to 1
monocle.object.all@clusters$UMAP$partitions[monocle.object.all@clusters$UMAP$partitions == "2"] <- "1"

#map pseudotime
monocle.object.all = learn_graph(monocle.object.all, learn_graph_control=list(ncenter=550, minimal_branch_len = 15), use_partition = FALSE)

plot_cells(monocle.object.all, color_cells_by="partition", group_cells_by="partition",  x = 1, y = 2)

## a helper function to identify the root principal points:
## make cluster 2 the root
# get_earliest_principal_node <- function(cds, time_bin="7"){
#   cell_ids <- which(colData(cds)[, "seurat_clusters"] == time_bin)
#   closest_vertex <-
#   cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex
#   closest_vertex <- as.matrix(closest_vertex[colnames(cds), ])
#   root_pr_nodes <-
#   igraph::V(principal_graph(cds)[["UMAP"]])$name[as.numeric(names
#   (which.max(table(closest_vertex[cell_ids,]))))]
#   
#   root_pr_nodes
# }

## calculate pseudotime
#monocle.object.all = order_cells(monocle.object.all, root_pr_nodes=get_earliest_principal_node(monocle.object.all))
monocle.object.all = order_cells(monocle.object.all)
## used 5 points at the beginning


## plot
umap_pt <- plot_cells(monocle.object.all, color_cells_by = "pseudotime", label_cell_groups=FALSE, cell_size = 1, x = 1, y = 2, label_branch_points=FALSE, label_leaves=FALSE, label_groups_by_cluster=FALSE, label_roots = FALSE) +
  coord_fixed() +
  theme_void() +
  labs(title = "") +
  theme(plot.title = element_text(hjust = 0.5, size=20), legend.position="bottom", legend.title=element_text (size=20), legend.text=element_text(size=20)) + 
  guides(colour = guide_colourbar(barwidth = 10, barheight = 2, title = "Pseudotime"))

## view plot
umap_pt

## help was obtained from here
## https://github.com/satijalab/seurat/issues/1658

save

ggsave("../images_to_export/UMAP_pt_wt.png", plot = umap_pt, device = "png", path = NULL, scale = 1, width = 20, height = 20, units = "cm", dpi = 300, limitsize = TRUE)

gganimnate GIF of pseuodtime

Select appropriate cells

plot <- DimPlot(tenx.justwt.integrated, label = FALSE, repel = TRUE, pt.size = 0.1, dims = c(1,2), reduction = "umap")
HoverLocator(plot, information = FetchData(object = tenx.justwt.integrated, vars = 'monocle_sex'))
the condition has length > 1 and only the first element will be usedthe condition has length > 1 and only the first element will be used`error_y.color` does not currently support multiple values.`error_x.color` does not currently support multiple values.`line.color` does not currently support multiple values.The titlefont attribute is deprecated. Use title = list(font = ...) instead.the condition has length > 1 and only the first element will be usedthe condition has length > 1 and only the first element will be used`error_y.color` does not currently support multiple values.`error_x.color` does not currently support multiple values.`line.color` does not currently support multiple values.The titlefont attribute is deprecated. Use title = list(font = ...) instead.
cells_to_remove <- c("SC26779_8_280", "CTGGTCTAGCAGATCG")
#install.packages("gganimate")
library(gganimate)
#install.packages("gifski")
#install.packages("av")
#library(gifski)
#library(av)

## make dataframe for plotting
## extract data for GGplot version of this
df_animation <- as.data.frame(monocle.object.all@int_colData@listData$reducedDims@listData[["UMAP"]])
## add pt to this data frame:
pt_values <- as.data.frame(pseudotime(monocle.object.all, reduction_method = "UMAP"))
df_animation <- merge(df_animation, pt_values, by="row.names") 
rownames(df_animation) <- df_animation$Row.names
colnames(df_animation)[4] <- "pt"

## make the static plot
p <- ggplot(df_animation, aes(x = UMAP_1, y = UMAP_2, colour = pt)) +
  geom_point() +
  scale_colour_viridis_c(option = "plasma") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "none")
## view plot
plot(p)

## make animated plot
## make a category for animation
#df_animation$group <- cut(df_animation$pt, 15)

anim <- p +
  transition_time(pt) +
  shadow_mark()

animate(anim, height = 3, width = 3, units = "in", res = 150, bg = 'transparent')

## to change the resolution - https://stackoverflow.com/questions/49058567/define-size-for-gif-created-by-gganimate-change-dimension-resolution 

Save animation

anim_save("animated_UMAP_transparent_bg_wt.gif", path = "../images_to_export/")
## extract pt values
pt_values <- as.data.frame(pseudotime(monocle.object.all, reduction_method = "UMAP"))

tenx.justwt.integrated <- AddMetaData(tenx.justwt.integrated, pt_values, "old_pt_values")

## make dataframe for plotting
## extract data for GGplot version of this
df_animation <- as.data.frame(tenx.justwt.integrated@reductions[["umap"]]@cell.embeddings)
## add pt to this data frame:
pt_values <- as.data.frame(tenx.justwt.integrated@meta.data[ ,c("old_pt_values", "pt_id_cols"), drop = FALSE])
df_animation <- merge(df_animation, pt_values, by="row.names") 
rownames(df_animation) <- df_animation$Row.names
colnames(df_animation)[4] <- "pt"
## remove outlier cells
df_animation <- df_animation[which(df_animation$Row.names %ni% cells_to_remove), ]

#df_animation <- df_animation[order(df_animation$pt, decreasing = TRUE), ]

## make a category for animation
#df_animation$group <- as.numeric(cut(df_animation$pt, 10, include.lowest = TRUE, labels = c(1:10)))

## make the static plot
p <- ggplot(df_animation, aes(x = UMAP_1, y = UMAP_2)) +
  geom_point(col = df_animation$pt_id_cols) +
  #scale_colour_viridis_c(option = "plasma") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "none") +
  view_static()
## view plot
plot(p)

## make animated plot


anim <- p +
  transition_time(pt) +
  shadow_mark() +
  #shadow_wake(wake_length = 0.99) #+
  enter_fade()

animate(anim, 
        height = 3, 
        width = 3, 
        units = "in", 
        res = 300, 
        bg = 'white', 
        #duration = 6, 
        fps = 20, 
        device = "png",
        #ref_frame = 1
        end_pause = 10
        )

Frame 1 (1%)
Frame 2 (2%)
Frame 3 (3%)
Frame 4 (4%)
Frame 5 (5%)
Frame 6 (6%)
Frame 7 (7%)
Frame 8 (8%)
Frame 9 (9%)
Frame 10 (10%)
Frame 11 (11%)
Frame 12 (12%)
Frame 13 (13%)
Frame 14 (14%)
Frame 15 (15%)
Frame 16 (16%)
Frame 17 (17%)
Frame 18 (18%)
Frame 19 (19%)
Frame 20 (20%)
Frame 21 (21%)
Frame 22 (22%)
Frame 23 (23%)
Frame 24 (24%)
Frame 25 (25%)
Frame 26 (26%)
Frame 27 (27%)
Frame 28 (28%)
Frame 29 (29%)
Frame 30 (30%)
Frame 31 (31%)
Frame 32 (32%)
Frame 33 (33%)
Frame 34 (34%)
Frame 35 (35%)
Frame 36 (36%)
Frame 37 (37%)
Frame 38 (38%)
Frame 39 (39%)
Frame 40 (40%)
Frame 41 (41%)
Frame 42 (42%)
Frame 43 (43%)
Frame 44 (44%)
Frame 45 (45%)
Frame 46 (46%)
Frame 47 (47%)
Frame 48 (48%)
Frame 49 (49%)
Frame 50 (50%)
Frame 51 (51%)
Frame 52 (52%)
Frame 53 (53%)
Frame 54 (54%)
Frame 55 (55%)
Frame 56 (56%)
Frame 57 (57%)
Frame 58 (58%)
Frame 59 (59%)
Frame 60 (60%)
Frame 61 (61%)
Frame 62 (62%)
Frame 63 (63%)
Frame 64 (64%)
Frame 65 (65%)
Frame 66 (66%)
Frame 67 (67%)
Frame 68 (68%)
Frame 69 (69%)
Frame 70 (70%)
Frame 71 (71%)
Frame 72 (72%)
Frame 73 (73%)
Frame 74 (74%)
Frame 75 (75%)
Frame 76 (76%)
Frame 77 (77%)
Frame 78 (78%)
Frame 79 (79%)
Frame 80 (80%)
Frame 81 (81%)
Frame 82 (82%)
Frame 83 (83%)
Frame 84 (84%)
Frame 85 (85%)
Frame 86 (86%)
Frame 87 (87%)
Frame 88 (88%)
Frame 89 (89%)
Frame 90 (90%)
Frame 91 (91%)
Frame 92 (92%)
Frame 93 (93%)
Frame 94 (94%)
Frame 95 (95%)
Frame 96 (96%)
Frame 97 (97%)
Frame 98 (98%)
Frame 99 (99%)
Frame 100 (100%)
Finalizing encoding... done!

Define cell identities with branches

Define identities of cells

male

monocle.object_male <- choose_graph_segments(monocle.object.all)

female

monocle.object_female <- choose_graph_segments(monocle.object.all)

bipotential

monocle.object_bipot <- choose_graph_segments(monocle.object.all)

asexual (pre-branch)

monocle.object_asex_pre <- choose_graph_segments(monocle.object.all)

asexual fate

monocle.object_asex_fate <- choose_graph_segments(monocle.object.all)

check

df_freq <- data.frame(table(c(colnames(monocle.object_male), colnames(monocle.object_female), colnames(monocle.object_bipot), colnames(monocle.object_asex_pre), colnames(monocle.object_asex_fate))))
paste("number of cells in seurat object is", length(colnames(monocle.object.all)), ". The number of cells selected here with an identitity is", dim(df_freq)[1])
df_freq <- df_freq[df_freq$Freq > 1, ]
df_freq

Inspect where these missing cells are:

# '%ni%' <- Negate('%in%')
# 
# not_assigned_cells <- colnames(monocle.object.all)[colnames(monocle.object.all) %ni% c(colnames(monocle.object_male), colnames(monocle.object_female), colnames(monocle.object_bipot), colnames(monocle.object_asex), colnames(monocle.object_asex_fate))]
# 
# DimPlot(seurat.object, repel = TRUE, label.size = 5, pt.size = 0.5, cells.highlight = not_assigned_cells, dims = c(2,1), reduction = "DIM_UMAP") +
#   coord_fixed() + 
#   scale_color_manual(values=c("#000000", "#f54e1e"))
## create annotation dataframe from these results:
df_monocle_sexes <- rbind(data.frame("cell_name" = colnames(monocle.object_male), "sex" = rep("Male", length(colnames(monocle.object_male)))),
                          data.frame("cell_name" = colnames(monocle.object_female), "sex" = rep("Female", length(colnames(monocle.object_female)))),
                          data.frame("cell_name" = colnames(monocle.object_bipot), "sex" = rep("Bipotential", length(colnames(monocle.object_bipot)))),
                          data.frame("cell_name" = colnames(monocle.object_asex_pre), "sex" = rep("Asexual_Early", length(colnames(monocle.object_asex_pre)))),
                          data.frame("cell_name" = colnames(monocle.object_asex_fate), "sex" = rep("Asexual_Late", length(colnames(monocle.object_asex_fate))))
                          #data.frame("cell_name" = not_assigned_cells, "sex" = rep("Unassigned", length(not_assigned_cells)))
                          )

dim(df_monocle_sexes)

## order like the metadata
df_monocle_sexes <- df_monocle_sexes[match(rownames(monocle.object.all@colData), df_monocle_sexes$cell_name), ]

## add this back into the monocle object
monocle.object.all@colData$Sexes_monocle <- df_monocle_sexes$sex

## add this to the seurat object
rownames(df_monocle_sexes) <- df_monocle_sexes$cell_name
df_monocle_sexes_to_add_to_seurat <- df_monocle_sexes[,c("sex"), drop = FALSE]
tenx.justwt.integrated <- AddMetaData(tenx.justwt.integrated, df_monocle_sexes_to_add_to_seurat, col.name = "monocle_sex")

pt vs. real time

gg <- ggplot(df_plotting, aes(as.factor(Prediction.Spearman.), fill = monocle_sex)) + geom_bar(stat = "count", position="fill")

gg

gg <- ggplot(df_plotting, aes(as.factor(Prediction.Spearman.), fill = monocle_sex)) + geom_bar(stat = "count", position="fill")

gg

8. Plots

make composite pseudotime/ID figure

# 1 = blue - "#0052c5"
# 2 = red - "#a52b1e"
# 3 = green - "#016c00"
# 4 = yellow - "#ffe400"
#pal_sex <- c("#0052c5","#ffe400", "#a52b1e", "#016c00")

## extract pseudotime numbers and identity of cells to a dataframe
df_pt_id <- tenx.justwt.integrated@meta.data[,c("old_pt_values", "monocle_sex")]

## inspect possible values
list_of_sexes <- names(table(df_pt_id$monocle_sex))

## make a new column
df_pt_id$colour <- NA

## make colour ramps
asex_ramp <- colorRampPalette(c("#D5E3F5", "#0052c5"))
male_ramp <- colorRampPalette(c("white", "yellow", "#016c00"))
female_ramp <- colorRampPalette(c("yellow", "#a52b1e"))
bipot_ramp <- colorRampPalette(c("white", "#ffe400"))

## re-classify the cells that are unassigned cells removed from sexual branch above:
#df_pt_id[which(rownames(df_pt_id) %in% remove_cells), ]$monocle_sex <- "Asexual"

## assign values to each cluster
## help here: https://stackoverflow.com/questions/9946630/colour-points-in-a-plot-differently-depending-on-a-vector-of-values 
df_pt_id[df_pt_id$monocle_sex == "Asexual_Early" | df_pt_id$monocle_sex == "Asexual_Late", ]$colour <- asex_ramp(100)[as.numeric(cut(df_pt_id[df_pt_id$monocle_sex == "Asexual_Early" | df_pt_id$monocle_sex == "Asexual_Late", ]$old_pt_values,breaks = 100))]

df_pt_id[df_pt_id$monocle_sex == "Male", ]$colour <- male_ramp(100)[as.numeric(cut(df_pt_id[df_pt_id$monocle_sex == "Male", ]$old_pt_values,breaks = 100))]

df_pt_id[df_pt_id$monocle_sex == "Female", ]$colour <- female_ramp(100)[as.numeric(cut(df_pt_id[df_pt_id$monocle_sex == "Female", ]$old_pt_values,breaks = 100))]

df_pt_id[df_pt_id$monocle_sex == "Bipotential", ]$colour <- bipot_ramp(100)[as.numeric(cut(df_pt_id[df_pt_id$monocle_sex == "Bipotential", ]$old_pt_values,breaks = 100))]

## check everything has a value
#table(is.na(df_pt_id$colour))

## make into a df
#df_pt_id <- df_pt_id[ ,"colour", drop = FALSE]

## add back to seurat object
df <- df_pt_id[ ,"colour", drop = FALSE]
tenx.justwt.integrated <- AddMetaData(tenx.justwt.integrated, df, "pt_id_cols")
rm(df)

## plot
## extract UMAP coords
df_umap_plot <- tenx.justwt.integrated@reductions[["umap"]]@cell.embeddings
df_umap_plot <- merge(df_umap_plot, df_pt_id, by=0, all=TRUE)

## add tree
##The tree for monocle is located here:
# monocle.object@principal_graph_aux[["UMAP"]]$dp_mst 
ica_space_df <- t(monocle.object.all@principal_graph_aux[["UMAP"]]$dp_mst) %>%
      as.data.frame() %>%
      dplyr::select_(prin_graph_dim_1 = "UMAP_1", prin_graph_dim_2 = "UMAP_2") %>%
      dplyr::mutate(sample_name = rownames(.),
                    sample_state = rownames(.))

dp_mst <- monocle.object.all@principal_graph[["UMAP"]]

edge_df <- dp_mst %>%
      igraph::as_data_frame() %>%
      dplyr::select_(source = "from", target = "to") %>%
      dplyr::left_join(ica_space_df %>%
                         dplyr::select_(
                           source="sample_name",
                           source_prin_graph_dim_1="prin_graph_dim_1",
                           source_prin_graph_dim_2="prin_graph_dim_2"),
                       by = "source") %>%
      dplyr::left_join(ica_space_df %>%
                         dplyr::select_(
                           target="sample_name",
                           target_prin_graph_dim_1="prin_graph_dim_1",
                           target_prin_graph_dim_2="prin_graph_dim_2"),
                       by = "target")

## make ggplot
umap_id_pt <- ggplot(df_umap_plot, aes(x = UMAP_1, y = UMAP_2)) + 
                     geom_point(col = df_umap_plot$colour) +
                     theme_void() +
                     coord_fixed() +
                     geom_segment(aes_string(x="source_prin_graph_dim_1",
                                     y="source_prin_graph_dim_2",
                                     xend="target_prin_graph_dim_1",
                                     yend="target_prin_graph_dim_2"),
                          data=edge_df)

umap_id_pt

save

#ggsave("../images_to_export/umap_id_pt.png", plot = umap_id_pt, device = "png", path = NULL, scale = 1, width = 21, height = 29.7, units = "cm", dpi = 300, limitsize = TRUE)

cluster plot

### Prepare data

## extract df with cluster ID, 
df_pt_id <- tenx.justwt.integrated@meta.data[ ,"seurat_clusters", drop=FALSE]
## extract UMAP coords
df_plot <- tenx.justwt.integrated@reductions[["umap"]]@cell.embeddings
df_plot <- merge(df_plot, df_pt_id, by=0, all=TRUE)
## add colour col
df_plot$colour <- df_plot$seurat_clusters
levels(df_plot$colour) <- list(print(asex_ramp(17)[1]) = "9", asex_ramp(17)[2]="4", 
                               asex_ramp(17)[3]="15", 
                                asex_ramp(17)[4] = "8", 
                                asex_ramp(17)[5]= "1",
                                asex_ramp(17)[6]=  "14", 
                                asex_ramp(17)[7] = "2", 
                                asex_ramp(17)[8] = "10",
                                asex_ramp(17)[9] = "3",
                                asex_ramp(17)[10] = "0",
                                asex_ramp(17)[11] = "6",
                                asex_ramp(17)[12] = "5",
                                asex_ramp(17)[13] = "7",
                                asex_ramp(17)[14] = "12",
                                asex_ramp(17)[15] = "18",
                                asex_ramp(17)[16] = "20",
                                asex_ramp(17)[17] = "23",
                                bipot_ramp(3)[2] = "11",
                                male_ramp(3)[2] = "16",
                                male_ramp(3)[3] = "13",
                                female_ramp(4)[2] = "21",
                                female_ramp(4)[3] = "22",
                                female_ramp(4)[4] = "19",
                                female_ramp(4)[4] = "17"
                                      )
## aabbreviate clusters
df_plot$cluster_names <- df_plot$seurat_clusters
levels(df_plot$cluster_names) <- list(A_1="9", 
                                      A_2="4", 
                                      A_3="15", 
                                      A_4 = "8", 
                                      A_5 = "1",
                                      A_6=  "14", 
                                      A_7 = "2", 
                                      A_8 = "10",
                                      A_9 = "3",
                                      A_10 = "0",
                                      A_11 = "6",
                                      A_12 = "5",
                                      A_13 = "7",
                                      A_14 = "12",
                                      A_15 = "18",
                                      A_16 = "20",
                                      A_17 = "23",
                                      B = "11",
                                      M_1 = "16",
                                      M_2 = "13",
                                      F_1 = "21",
                                      F_2 = "22",
                                      F_3 = "19",
                                      F_3 = "17"
                                      )

library(dplyr)
dat%>%
group_by(custid)%>% 
summarise(Mean=mean(value), Max=max(value), Min=min(value), Median=median(value), Std=sd(value))



## plot
umap_id <- ggplot(df_plot, aes(x = UMAP_1, y = UMAP_2)) + 
                     geom_point(col = df_plot$colour) +
                     theme_void() +
                     coord_fixed()

umap_id
ggsave("../images_to_export/umap_with_clusters.png", plot = umap_with_clusters, device = "png", path = NULL, scale = 1, width = 21, height = 29.5, units = "cm", dpi = 300, limitsize = TRUE)

save

ggsave("../images_to_export/umap_with_clusters.png", plot = umap_with_clusters, device = "png", path = NULL, scale = 1, width = 21, height = 29.5, units = "cm", dpi = 300, limitsize = TRUE)
## extract proportion of cells
library(plyr)
df_prop_comb <- as.data.frame(table(tenx.justwt.integrated@meta.data$seurat_clusters_dot_plotting, tenx.justwt.integrated@meta.data$experiment))
names(df_prop_comb) <- c("cluster", "experiment", "Freq")

## calcualte the percentage
df_prop_comb$pc <- NA
df_prop_comb[df_prop_comb$experiment == "mutants", ]$pc <- (df_prop_comb[df_prop_comb$experiment == "mutants", ]$Freq/sum(df_prop_comb[df_prop_comb$experiment == "mutants", ]$Freq))*100
df_prop_comb[df_prop_comb$experiment == "tenx_5k", ]$pc <- (df_prop_comb[df_prop_comb$experiment == "tenx_5k", ]$Freq/sum(df_prop_comb[df_prop_comb$experiment == "tenx_5k", ]$Freq))*100

## reorder levels in Var1 
#df_prop_comb$cluster <- factor(df_prop_comb$cluster, levels = c("unassigned", "M", "F", "5", "7", "4", "3", "1", "0", "2", "6"))

## then reorder by this so the cumsum will work below
df_prop_comb <- df_prop_comb[rev(order(df_prop_comb$experiment, df_prop_comb$cluster)),]

## Calculate the cumulative sum of len for each dose
df_cumsum <- ddply(df_prop_comb, "experiment", transform, label_ypos=cumsum(pc) - 0.5*pc)
head(df_cumsum)
# http://www.sthda.com/english/wiki/ggplot2-barplots-quick-start-guide-r-software-and-data-visualization

## reverse the levels in Var1
#df_cumsum$cluster <- factor(df_cumsum$cluster, levels = rev(c(levels(df_cumsum$cluster))))
#df_cumsum <- df_cumsum[c(match(df_cumsum$cluster[1:23], c(levels(df_cumsum$cluster))), match(df_cumsum$cluster[1:23], c(levels(df_cumsum$cluster))) + 23), ]

library(ggrepel)
library(ggpubr)
## make plot
plot_prop <- ggplot(data=df_cumsum, aes(x=experiment, y=pc, fill=cluster)) +
  geom_bar(stat="identity")+
  geom_label_repel(aes(label = Freq, y=label_ypos, fill = factor(cluster)), color = c(rep(c(rep(c("#ffffff"), 2), "#000000", "#ffffff", rep(c("#000000"), 2), rep(c("#ffffff"), 5), rep(c("#000000"), 12)), 2)), size = 3.5, show.legend = FALSE) +
  #geom_text(aes(y=label_ypos, label=Freq), vjust=0, color="black", size=3.5) +
  labs(fill = "Cell cluster", y= "Cell Proportions (%)", x = "Technology") +
  scale_fill_manual(labels = c("Asexual 1", "Asexual 2" , "Asexual 3", "Asexual 4", "Asexual 5", "Asexual 6", "Asexual 7", "Asexual 8",  "Asexual 9", "Asexual 10", "Asexual 11", "Asexual 12", "Asexual 13", "Asexual 14", "Asexual 15", "Asexual 16", "Asexual 17", "Progenitor", "Male 1", "Male 2", "Female 1", "Female 2", "Female 3"), values = pal_plot) +
  theme_pubr() +
  theme(legend.text=element_text(size=10), legend.position="bottom") +
  scale_x_discrete(labels= rev(c("10x", "Smart-seq2"))) +
  coord_flip() +
  guides(fill=guide_legend(ncol=9))


plot_prop
ggsave("../images_to_export/WT_cell_type_proportions.png", plot = plot_prop, device = "png", path = NULL, scale = 1, width = 30, height = 9, units = "cm", dpi = 300, limitsize = TRUE)

dotplot

A dotplot allows us to look at the expression of multiple genes in a clearer way than succesive UMAP plots

### Data set-up

# PBANKA-0828000         GCSKO-3  GD1

# PBANKA-1302700       GCSKO-oom  MD1 
# PBANKA-1447900        GCSKO-29  MD2
# PBANKA-0413400    GCSKO-10_820  MD3
# PBANKA-0716500        GCSKO-19  MD4 
# PBANKA-0102400         GCSKO-2  MD5 

# PBANKA-1454800        GCSKO-21  FD1
# PBANKA-0902300        GCSKO-13  FD2
# PBANKA-1418100        GCSKO-17  FD3   
# PBANKA-1435200        GCSKO-20  FD4 

# PBANKA-1437500 - AP2G - commitment
# PBANKA-1319500 - CCP2 - female - used in 820 line
# PBANKA-0416100 - MG1 - dynenin heavy chain - male - used in 820 line
# PBANKA-0831000 - MSP1 - late asexual
# PBANKA-1102200 - MSP8 - early asexual (from Bozdech paper)

marker_genes_list <- c("PBANKA-1437500", "PBANKA-1319500", "PBANKA-0416100", "PBANKA-0831000", "PBANKA-1102200")
mutant_genes_list <- c("PBANKA-0828000", "PBANKA-1302700", "PBANKA-1447900", "PBANKA-0413400", "PBANKA-0716500", "PBANKA-0102400",  "PBANKA-1454800", "PBANKA-0902300", "PBANKA-1418100", "PBANKA-1435200")

## these get defined later on, but are replicated above here for plotting
asexual_early_clusters <- c(9, 4, 15, 8, 1, 14, 2, 10, 3, 0, 6, 5)
asexual_late_clusters <- c(7, 12, 18, 20, 23)
bipotentional_early_clusters <- # 0?
bipotential_clusters <- c(11) 
male_clusters <- c(16, 13)
female_clusters <- c(21, 22, 17, 19)

## copy the clusters so you don't permanently edit the master
tenx.justwt.integrated@meta.data$seurat_clusters_dot_plotting <- tenx.justwt.integrated@meta.data$seurat_clusters

## reorder the levels so you can plot the cluters as you wish
my_levels <- c(asexual_early_clusters,asexual_late_clusters, bipotential_clusters, male_clusters, female_clusters)

## reorder the levels
tenx.justwt.integrated@meta.data$seurat_clusters_dot_plotting <- factor(x = tenx.justwt.integrated@meta.data$seurat_clusters_dot_plotting, levels = my_levels)

## rename clusters so that they are intuitive
# this trick is from here: http://www.cookbook-r.com/Manipulating_data/Renaming_levels_of_a_factor/
levels(tenx.justwt.integrated@meta.data$seurat_clusters_dot_plotting) <- list(Asexual_1="9", 
                                                                              Asexual_2="4", 
                                                                              Asexual_3="15", 
                                                                              Asexual_4 = "8", 
                                                                              Asexual_5 = "1",
                                                                              Asexual_6=  "14", 
                                                                              Asexual_7 = "2", 
                                                                              Asexual_8 = "10",
                                                                              Asexual_9 = "3",
                                                                              Asexual_10 = "0",
                                                                              Asexual_11 = "6",
                                                                              Asexual_12 = "5",
                                                                              Asexual_13 = "7",
                                                                              Asexual_14 = "12",
                                                                              Asexual_15 = "18",
                                                                              Asexual_16 = "20",
                                                                              Asexual_17 = "23",
                                                                              Progenitor = "11",
                                                                              Male_1 = "16",
                                                                              Male_2 = "13",
                                                                              Female_1 = "21",
                                                                              Female_2 = "22",
                                                                              Female_3 = "19",
                                                                              Female_3 = "17"
                                                                              )

### Annotation set-up

## extract pseudotime numbers and identity of cells to a dataframe
df_pt_id <- tenx.justwt.integrated@meta.data[,c("old_pt_values", "monocle_sex", "seurat_clusters_dot_plotting", "Prediction.Spearman.", "Prediction.Spearman._Kasia")]

## make a new column
df_pt_id$colour <- NA

## assign bins to each of the values
## help here: https://stackoverflow.com/questions/9946630/colour-points-in-a-plot-differently-depending-on-a-vector-of-values 
df_pt_id[df_pt_id$monocle_sex == "Asexual_Early" | df_pt_id$monocle_sex == "Asexual_Late", ]$colour <- as.numeric(cut(df_pt_id[df_pt_id$monocle_sex == "Asexual_Early" | df_pt_id$monocle_sex == "Asexual_Late", ]$old_pt_values,breaks = 100))

df_pt_id[df_pt_id$monocle_sex == "Male", ]$colour <- as.numeric(cut(df_pt_id[df_pt_id$monocle_sex == "Male", ]$old_pt_values,breaks = 100))

df_pt_id[df_pt_id$monocle_sex == "Female", ]$colour <- as.numeric(cut(df_pt_id[df_pt_id$monocle_sex == "Female", ]$old_pt_values,breaks = 100))

df_pt_id[df_pt_id$monocle_sex == "Bipotential", ]$colour <- as.numeric(cut(df_pt_id[df_pt_id$monocle_sex == "Bipotential", ]$old_pt_values,breaks = 100))

## make colour ramps
asex_ramp <- colorRampPalette(c("#D5E3F5", "#0052c5"))
male_ramp <- colorRampPalette(c("white", "yellow", "#016c00"))
female_ramp <- colorRampPalette(c("yellow", "#a52b1e"))
bipot_ramp <- colorRampPalette(c("white", "#ffe400"))

## assign values to each cluster
## take the mean of the bin
df_annotation <- aggregate(df_pt_id[, "colour"], list(df_pt_id$seurat_clusters_dot_plotting), mean)
## BECAUSE we have ordered the clusters already, we can simply take the row index for this bit
df_annotation$colour <- NA
df_annotation[1:17, ]$colour <- asex_ramp(100)[df_annotation[1:17, ]$x]
df_annotation[18, ]$colour <- bipot_ramp(100)[df_annotation[18, ]$x]
df_annotation[19:20, ]$colour <- male_ramp(100)[df_annotation[19:20, ]$x]
df_annotation[21:23, ]$colour <- female_ramp(100)[df_annotation[21:23, ]$x]

## plot annotation 
## this really helped: https://www.biostars.org/p/396810/
h2 <- ggplot(df_annotation)+
  geom_bar(mapping = aes(x = Group.1, y = 1), 
           stat = "identity",
           fill = df_annotation$colour,
           col = "#FFFFFF",
           width = 1)+
      theme_void()+
      theme(panel.spacing.x = unit(1, "mm"))#+
     #facet_grid(.~colour, scales = "free_x")
     #legend <- plot_grid(get_legend(h2), get_legend(h1), ncol = 1)
 h2 <- h2 + theme(legend.position = "none")
 #   geom_point(col = df_umap_plot$colour)
 
 h1 <- ggplot(df_annotation)+
  geom_point(mapping = aes(x = Group.1, y = 1),
            col = df_annotation$colour,
            size = 5)+
      theme_void()+
      theme(panel.spacing.x = unit(1, "mm"))#+
     #facet_grid(.~colour, scales = "free_x")
     #legend <- plot_grid(get_legend(h2), get_legend(h1), ncol = 1)
 h1 <- h1 + theme(legend.position = "none")
 #   geom_point(col = df_umap_plot$colour)

 ## add predicted time point
 ## take the mean of the bin
df_annotation <- aggregate(df_pt_id[, "Prediction.Spearman."], list(df_pt_id$seurat_clusters_dot_plotting), mean)
df_annotation$colour <- NA
df_annotation$colour <- viridis(300)[(df_annotation$x)*10]

h3 <- ggplot(df_annotation)+
  geom_bar(mapping = aes(x = Group.1, y = 1), 
           stat = "identity",
           fill = df_annotation$colour,
           col = "#FFFFFF",
           width = 1)+
      theme_void()+
      theme(panel.spacing.x = unit(1, "mm"))#+
     #facet_grid(.~colour, scales = "free_x")
     legend_h3 <- get_legend(h3)
 h3 <- h3 + theme(legend.position = "none")

  h3 <- ggplot(data = df_annotation, 
              mapping = aes(x = Group.1, fill=x)
              ) +
       geom_bar(col = "#FFFFFF", width = 1)+
       theme_void() +
       theme(panel.spacing.x = unit(1, "mm")) +
       scale_fill_viridis(option = 'cividis')
 legend_h3 <- get_legend(h3)
 h3 <- h3 + theme(legend.position = "none")
 
 ## add kasia data
  ## take the mean of the bin
df_annotation <- aggregate(df_pt_id[, "Prediction.Spearman._Kasia"], list(df_pt_id$seurat_clusters_dot_plotting), mean)
df_annotation$colour <- NA
df_annotation$colour <- viridis(300)[(df_annotation$x)*10]

h4 <- ggplot(df_annotation)+
  geom_bar(mapping = aes(x = Group.1, y = 1), 
           stat = "identity",
           fill = df_annotation$colour,
           col = "#FFFFFF",
           width = 1)+
      theme_void()+
      theme(panel.spacing.x = unit(1, "mm"))+
     facet_grid(.~colour, scales = "free_x")
     legend <- plot_grid(get_legend(h4), ncol = 1)
    legend_h4 <- get_legend(h4)
 h4 <- h4 + theme(legend.position = "none")

 h4 <- ggplot(data = df_annotation, 
              mapping = aes(x = Group.1, fill=x)
              ) +
       geom_bar(col = "#FFFFFF", width = 1)+
       theme_void() +
       theme(panel.spacing.x = unit(1, "mm")) +
       scale_fill_viridis()
 legend_h4 <- get_legend(h4)
 h4 <- h4 + theme(legend.position = "none")

### Plot
dot_plot_markers <- DotPlot(tenx.justwt.integrated, 
                            features = c(marker_genes_list, rev(mutant_genes_list)), 
                            group.by = "seurat_clusters_dot_plotting", 
                            dot.min = 0.00001,
                            assay = 'RNA') +
  theme_classic() +
  coord_fixed() +
  coord_flip() +
  # change appearance and remove axis elements, and make room for arrows
  theme(axis.text.x = element_text(size=16, angle = 45, hjust=1,vjust=1, family = "Arial"), 
        axis.text.y = element_text(size=16, face="italic"), 
        text=element_text(size=16, family="Arial", colour="black"), 
        legend.position = "bottom", 
        legend.direction = "horizontal", 
        legend.box = "vertical", 
        plot.title = element_blank(), 
        plot.margin = unit(c(1,3,1,3), "lines"), 
        axis.line = element_blank(),
        panel.border = element_rect(colour = "black", fill=NA, size=0.5)) +
  #change the colours
  #scale_colour_viridis(option = "inferno", guide = "colourbar", na.value="white", begin = 0, end = 1, direction = 1) +
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## change x axis label
  # labs(x = "Marker Genes", y = "Cluster", title = "Expression of Marker Genes by Cluster") 
  labs(x = "", y = "", title = "") +
  ## add arrows
  #annotate("segment", x = 5.5, xend = 5.5, y = 21.5, yend = 25, colour = "green", size=1, alpha=1, arrow=arrow(length=unit(0.30,"cm"), type = "closed")) +
  #annotate("segment", x = 5.5, xend = 5.5, y = 16.5, yend = 21.5, colour = "red", size=1, alpha=1, arrow=arrow(length=unit(0.30,"cm"), type = "closed")) +
  #annotate("segment", x = 5.5, xend = 5.5, y = 0, yend = 15.5, colour = "grey", size=1, alpha=1, arrow=arrow(length=unit(0.30,"cm"), type = "closed")) +
  ## annotate asex
  geom_hline(aes(yintercept = (length(c(asexual_early_clusters, asexual_late_clusters))+0.5)), size = 0.5, linetype= 'dashed') +
  ## annotate bipotential
  geom_hline(aes(yintercept = (length(c(asexual_early_clusters, asexual_late_clusters, bipotential_clusters))+0.5)), size = 0.5, linetype= 'dashed') +
  ## annotate sexes
  geom_hline(aes(yintercept = (length(c(asexual_early_clusters, asexual_late_clusters, bipotential_clusters, male_clusters))+0.5)), size = 0.5, linetype= 'dashed') +
  ## change label on bottom of plot so we can indicate markers
  scale_x_discrete(labels = rev(c("gd1", "md1", "md2", "md3", "md4", "md5", "fd1", "fd2", "fd3", "fd4", "msp8", "msp1", "mg1", "ccp2", "ap2g"))) +
  ## change label on bottom of plot so we can indicate markers
  scale_y_discrete(labels = c("Asexual 1", "Asexual 2" , "Asexual 3", "Asexual 4", "Asexual 5", "Asexual 6", "Asexual 7", "Asexual 8",  "Asexual 9", "Asexual 10", "Asexual 11", "Asexual 12", "Asexual 13", "Asexual 14", "Asexual 15", "Asexual 16", "Asexual 17", "Progenitor", "Male 1", "Male 2", "Female 1", "Female 2", "Female 3")) +
  ## change name of legends
  guides(col=guide_colorbar(title = 'Scaled Average Expression'),
         size=guide_legend("% of cells expressing"))

## view
#print(dot_plot_markers)

plot <- plot_grid(h4, h3, h1, dot_plot_markers, align = "v", ncol = 1, axis = "tb", rel_heights = c(0.5, 0.5, 0.5, 18)) 

dot_plot <- plot_grid(plot, legend_h3, legend_h4, nrow = 1, rel_widths = c(10, 1.5, 1.5))

dot_plot

save

ggsave("../images_to_export/dot_plot_all.png", plot = dot_plot, device = "png", path = NULL, scale = 1, width = 29.7, height = 21, units = "cm", dpi = 300, limitsize = TRUE)

Make individual plots highlighting where cells in each cluster fall

plot

## this function writes the next bit of code for you
## put it into the console and paste the response
#ploty <- c()
#for(i in seq_along(levels(tenx.justwt.integrated@meta.data$seurat_clusters))){
#  ploty <- paste0(ploty, "list_UMAPs_by_cluster[[", i, "]]", " + ")
#}

## plot
composite_plot <- plot_grid(list_UMAPs_by_cluster[[1]], list_UMAPs_by_cluster[[2]], list_UMAPs_by_cluster[[3]], list_UMAPs_by_cluster[[4]], list_UMAPs_by_cluster[[5]], list_UMAPs_by_cluster[[6]], list_UMAPs_by_cluster[[7]], list_UMAPs_by_cluster[[8]], list_UMAPs_by_cluster[[9]], list_UMAPs_by_cluster[[10]], list_UMAPs_by_cluster[[11]], list_UMAPs_by_cluster[[12]], list_UMAPs_by_cluster[[13]], list_UMAPs_by_cluster[[14]], list_UMAPs_by_cluster[[15]], list_UMAPs_by_cluster[[16]], list_UMAPs_by_cluster[[17]], list_UMAPs_by_cluster[[18]], list_UMAPs_by_cluster[[19]], list_UMAPs_by_cluster[[20]], list_UMAPs_by_cluster[[21]], list_UMAPs_by_cluster[[22]], list_UMAPs_by_cluster[[23]], ncol = 8)

composite_plot

save

ggsave("../images_to_export/clusters_composite_plot.png", plot = composite_plot, device = "png", path = NULL, scale = 1, width = 21, height = 10, units = "cm", dpi = 300, limitsize = TRUE)

9. Subset sexual cells

Make a subsetted Seurat object of sexual cells.

Include the pre-branch too as well as any weird clusters that may have clustered out.

it’s been a while since we looked at the clusters so let’s check them out again:

## Plot
DimPlot(tenx.justwt.integrated, label = TRUE, repel = FALSE, pt.size = 0.05, group.by = "seurat_clusters", dims = c(1,2), reduction = "umap") + coord_fixed()

## plot
list_UMAPs_by_cluster[[1]] + list_UMAPs_by_cluster[[2]] + list_UMAPs_by_cluster[[3]] + list_UMAPs_by_cluster[[4]] + list_UMAPs_by_cluster[[5]] + list_UMAPs_by_cluster[[6]] + list_UMAPs_by_cluster[[7]] + list_UMAPs_by_cluster[[8]] + list_UMAPs_by_cluster[[9]] + list_UMAPs_by_cluster[[10]] + list_UMAPs_by_cluster[[11]] + list_UMAPs_by_cluster[[12]] + list_UMAPs_by_cluster[[13]] + list_UMAPs_by_cluster[[14]] + list_UMAPs_by_cluster[[15]] + list_UMAPs_by_cluster[[16]] + list_UMAPs_by_cluster[[17]] + list_UMAPs_by_cluster[[18]] + list_UMAPs_by_cluster[[19]] + list_UMAPs_by_cluster[[20]] + list_UMAPs_by_cluster[[21]] + list_UMAPs_by_cluster[[22]] + list_UMAPs_by_cluster[[23]] + list_UMAPs_by_cluster[[24]]

Define cells and subset

asexual_early_clusters <- c(9, 4, 15, 8, 1, 14, 2, 10, 3, 0, 6, 5)
asexual_late_clusters <- c(7, 12, 18, 20, 23)
bipotentional_early_clusters <- # 0?
bipotential_clusters <- c(11) 
male_clusters <- c(16, 13)
female_clusters <- c(21, 22, 17, 19)

## define cells
cell_names_subset_monocle_ids <- rownames(tenx.justwt.integrated@meta.data[tenx.justwt.integrated@meta.data$monocle_sex %in% c("Asexual_Early", "Bipotential", "Male", "Female"), ])

## 3, 0, 6, 5 are early clusters of asexuals before the branch
cell_names_subset_cluster_ids <- rownames(tenx.justwt.integrated@meta.data[tenx.justwt.integrated@meta.data$seurat_clusters %in% c(male_clusters, female_clusters, bipotential_clusters, 3, 0, 6, 5), ])

cell_names_subset_intersect <- intersect(cell_names_subset_monocle_ids, cell_names_subset_cluster_ids)

## subset cells into new object
tenx.justwt.integrated.sex <- subset(tenx.justwt.integrated, cells = cell_names_subset_intersect)

inspect/check

## inspect object
tenx.justwt.integrated.sex

## look at original UMAP
DimPlot(tenx.justwt.integrated.sex, label = TRUE, repel = TRUE, pt.size = 0.1, split.by = "experiment", dims = c(1,2), reduction = "umap") + coord_fixed()

Remove contaminant asexual cells

we want to remove:

## look at original UMAP
plot_sexual_subsetting <- DimPlot(tenx.justwt.integrated.sex, label = TRUE, repel = TRUE, pt.size = 0.1, dims = c(1,2), reduction = "umap") + 
  coord_fixed() + 
  geom_vline(aes(xintercept = -1, alpha = 5))

plot_sexual_subsetting
## extract cell embeddings
df_sex_cell_embeddings <- as.data.frame(tenx.justwt.integrated.sex@reductions[["umap"]]@cell.embeddings)

## subset anything lower than -0.8 in UMAP 2 and -0.1 in UMAP 1
remove_cells <- row.names(df_sex_cell_embeddings[which(df_sex_cell_embeddings$UMAP_1 < -1), ])

## plot these cells
DimPlot(tenx.justwt.integrated.sex, label = FALSE, repel = TRUE, pt.size = 0.1, cells.highlight = remove_cells, dims = c(1,2), reduction = "umap") + 
  coord_fixed() + 
  scale_color_manual(values=c("#000000", "#f54e1e")) + 
  theme_void() + 
  labs(title = paste("cells highlighted will be removed")) + 
  theme(plot.title = element_text(hjust = 0.5), legend.position = "none")

Final Subset

## make keep cells from the remove_cells
## make the not in function
'%ni%' <- Negate('%in%')
keep_cells <- colnames(tenx.justwt.integrated.sex)[which(colnames(tenx.justwt.integrated.sex) %ni% remove_cells)]

## subset
tenx.justwt.integrated.sex <- subset(tenx.justwt.integrated.sex, cells = keep_cells)

## inspect
tenx.justwt.integrated.sex

copy old clusters over

## copy old clusters
tenx.justwt.integrated.sex <- AddMetaData(tenx.justwt.integrated.sex, tenx.justwt.integrated.sex@meta.data$seurat_clusters, col.name = "post_integration_clusters")

10. Sex assignment of mutants

A. Seurat Method

## see how this overlaps with fluorescence sorted on for 820 to confirm accuracy
df_820 <- df[df$genetic_background =="PBANKA_820", ]
table(df_820$predicted.id, df_820$fluoresence_sorted_on)
               
                GFP Hoechst mCherry
  Asexual_Early   0      69       1
  Asexual_Late   33     459      16
  Female          2     342     240
  Male          182      49       0
## see how this overlaps with fluorescence sorted on for 820 to confirm accuracy
df_820 <- df[df$genetic_background =="PBANKA_820", ]
table(df_820$predicted.id, df_820$fluoresence_sorted_on)

inspect gd1

df[which(df$predicted.id == "Female_1" & df$identity_name_updated == "gd1"), ][, 118:146]

Calculate sex ratios

## use designations above for sexes
male_cells <- rownames(mutant_seurat@meta.data[mutant_seurat@meta.data$predicted.id == "Male", ])
female_cells <- rownames(mutant_seurat@meta.data[mutant_seurat@meta.data$predicted.id == "Female", ])
ss2_mutants_final_male <- subset(mutant_seurat, cells = male_cells)
ss2_mutants_final_female <- subset(mutant_seurat, cells = female_cells)

## inspect
ss2_mutants_final_male
ss2_mutants_final_female
## calculate sex ratios
##subset out H, sorted cells:
df_male <- ss2_mutants_final_male@meta.data[ss2_mutants_final_male@meta.data$exclude_for_sex_ratio == FALSE,]

dim(df_male)

df_female <- ss2_mutants_final_female@meta.data[ss2_mutants_final_female@meta.data$exclude_for_sex_ratio == FALSE,]

dim(df_female)

## make dataframe
df_sex_ratio <- merge(
  as.data.frame(table(df_male$sub_name_updated)), 
  as.data.frame(table(df_female$sub_name_updated)), 
  by = "Var1", all=TRUE)

# or use identity_updated

## add names
names(df_sex_ratio) <- c("genotype", "male", "female")

## change the NAs to 0
df_sex_ratio[is.na(df_sex_ratio)] <- 0

## collapse 820 wild-types together
combined_m <- df_sex_ratio[df_sex_ratio$genotype == "WT-820_3_5", ]$male + df_sex_ratio[df_sex_ratio$genotype == "WT-820", ]$male
combined_f <- df_sex_ratio[df_sex_ratio$genotype == "WT-820_3_5", ]$female + df_sex_ratio[df_sex_ratio$genotype == "WT-820", ]$female
df_sex_ratio <- rbind(df_sex_ratio, c("WT-820-combined", combined_m, combined_f))
# remove old rows
df_sex_ratio <- df_sex_ratio[-which(df_sex_ratio$genotype == "WT-820_3_5" | df_sex_ratio$genotype == "WT-820"), ]
# need to make numeric again
df_sex_ratio$male <- as.numeric(df_sex_ratio$male)
df_sex_ratio$female <- as.numeric(df_sex_ratio$female)
df_sex_ratio$genotype <- as.character(df_sex_ratio$genotype)
# add name for WT combined
df_sex_ratio$genotype[17] <- "WT-820-combined"

## calculate sex ratio
df_sex_ratio$sex_ratio <- (df_sex_ratio$male + 0.1)/(df_sex_ratio$female + 0.1)

## log sex ratio
df_sex_ratio$sex_ratio_log <- log10(df_sex_ratio$sex_ratio)

##view
df_sex_ratio
## remove WT-2 because it is really inappropriate to have a sex ratio for this:
df_sex_ratio <- df_sex_ratio[-which(df_sex_ratio$genotype == "WT-md5"),]

plot

library(latex2exp) # so you can plot the fraction, it's on CRAN `install.packages("latex2exp")`

## make extra column for plotting aesthetics:
df_sex_ratio$above <- df_sex_ratio$sex_ratio_log > 0

## make extra column with ratio in it:
df_sex_ratio$genotype_with_n <- paste0(df_sex_ratio$genotype, " (", df_sex_ratio$male, "/", df_sex_ratio$female, ")")

## reorder genotype so it is in the correct order for plotting
df_sex_ratio$genotype_with_n <- factor(df_sex_ratio$genotype_with_n, levels = df_sex_ratio$genotype_with_n[order(df_sex_ratio$sex_ratio_log)])

## plot
sex_ratio_plot <- ggplot(df_sex_ratio, aes(sex_ratio_log, genotype_with_n, color = above)) +
      ## add the lines for the lollipop plot
      geom_segment(aes(x = 0, y = genotype_with_n, xend = sex_ratio_log, yend = genotype_with_n), color = "grey50") +
      ## add the points for the lollipop plot
      geom_point(aes(size = 4)) +
      ## add the wild-type rectangle
      annotate("rect", xmin= -0.23182478, xmax = 1.00105797, ymin=-Inf , ymax=Inf, alpha=0.4, color=NA,linetype = 2, fill="#999999") +
      ## make prettier
      theme_classic() +
      theme(legend.position = "none", text=element_text(size=16, family="Arial")) + 
      ## change axis labels
      labs(y = TeX("$Genotype\ \\left(\\frac{n_{male}}{n_{female}}\\right)$")) +
      #labs(y = expression(paste("Genotype", group("(", frac(paste("n male"), "n female"), ")")))) +
      xlab(expression(paste("Sex Ratio (", log[10], 
                               group("(",
                                      frac(paste("n male + 0.1"), 
                                           paste("n female + 0.1")),
                                   ")"), ")" ))) +
      ## change colours of lollipops
      scale_colour_manual(values = c("#a52b1e", "#016c00")) +
      ## annotate phenotypes
      geom_hline(aes(yintercept = 5.5)) #+
      #geom_hline(aes(yintercept = 14.5))

print(sex_ratio_plot)

### CHANGE TO BIG BRACKET HERE: https://www.overleaf.com/learn/latex/Brackets_and_Parentheses 

#paste("Sex Ratio", "\n", "log10((n male + 0.1)/(n female + 0.1))")
#theme(legend.position = "none", text=element_text(size=16, family="Arial"))

## fraction titles: https://groups.google.com/forum/#!topic/ggplot2/bgNRnZ82hJY 
## https://uc-r.github.io/lollipop

#ggsave(filename = "../images_to_export/sex_ratio_plot.png", device = "png", width = 7, height = 7, units = "in")

B. SCMAP Method

Build the index

### Making an ortholog reference index

## load in mca data
#counts <- read.csv("../scmap/allpb10x_counts.csv", row.names = 1)
#pheno <- read.csv("../scmap/allpb10x_pheno.csv")
#ggplot(pheno, aes(x=PC2_3d, y = PC3_3d,colour=absclust3)) + geom_point()

## load required libraries
library(scmap) #https://bioconductor.org/packages/release/bioc/html/scmap.html 
library(SingleCellExperiment) #

#prep the SCE, if was originally a Suerat object need the dfs to be regular matrices
#pb_filtered_sce_orth <- pb_filtered_sce_orth[, colData(pb_filtered_sce_orth)$absclust3 != "8"]
#sce <- pb_filtered_sce_orth
#pca <- plotPCA(sce)
#pcs <- pca$data
#table(rownames(pcs)==colnames(sce))
#colData(sce) <- cbind(colData(sce), pcs)
#rowData(sce)$feature_symbol <- rowData(sce)$gene

## extract data from Seurat
#cells_tenx <- rownames(tenx.justwt.integrated@meta.data[which(tenx.justwt.integrated@meta.data$experiment == "tenx_5k"), ])
#tenx.justwt.integrated.10k <- subset(tenx.justwt.integrated, cells = cells_tenx)
counts = as.matrix(GetAssayData(tenx.justwt.integrated, slot = "counts", assay = "RNA"))
pheno = as.data.frame(tenx.justwt.integrated@meta.data)

## add UMAP coordinates
df_sex_cell_embeddings <- as.data.frame(tenx.justwt.integrated@reductions[["umap"]]@cell.embeddings)
pheno <- cbind(pheno, df_sex_cell_embeddings)

## Set up object
sce <- SingleCellExperiment(list(counts=counts),
    colData=DataFrame(label=pheno),
    rowData=DataFrame(feature_symbol=rownames(counts)))
sce

## manual logging
#counts_1 <- assay(sce, "counts")
#libsizes <- colSums(counts_1)
#size.factors <- libsizes/mean(libsizes)
#logcounts(sce) <- log2(t(t(counts_1)/size.factors) + 1)
#counts(sce) <- as.matrix(counts(sce))
#logcounts(sce) <- as.matrix(logcounts(sce))
logcounts(sce) <- as.matrix(GetAssayData(tenx.justwt.integrated, slot = "data", assay = "RNA"))

## remove features with duplicated names
sce <- sce[!duplicated(rownames(sce)), ]

## build scmap-cell reference index, save this rds
sce <- selectFeatures(sce, suppress_plot = FALSE, n_features = 2000)
table(rowData(sce)$scmap_features)
set.seed(1)
sce <- indexCell(sce, M = 50, k = 80)
names(metadata(sce)$scmap_cell_index)
#length(metadata(sce)$scmap_cell_index$subcentroids)
#dim(metadata(sce)$scmap_cell_index$subcentroids[[1]])
#metadata(sce)$scmap_cell_index$subcentroids[[1]][,1:5]
#dim(metadata(sce)$scmap_cell_index$subclusters)

#saveRDS(pb_filtered_sce_orth, file="pb_filtered_sce_orthindex_20181109.rds")

Map to the index

## see how this overlaps with fluorescence sorted on for 820 to confirm accuracy
df_820 <- colData(mutants.sce)[colData(mutants.sce)$label.genetic_background =="PBANKA_820", ]
table(df_820$topcell_sp, df_820$label.fluoresence_sorted_on)
               
                GFP Hoechst mCherry
  Asexual_Early   0      89       2
  Asexual_Late    5     150       3
  Female          2      84      54
  Male          158      47       0
  unassigned     52     549     198
## see how this overlaps with fluorescence sorted on for 820 to confirm accuracy
df_820 <- colData(mutants.sce)[colData(mutants.sce)$label.genetic_background =="PBANKA_820", ]
table(df_820$topcell_sp, df_820$label.fluoresence_sorted_on)
table(colData(mutants.sce)$topcell_sp, colData(mutants.sce)$label.identity_name_updated)
## see how this overlaps with fluorescence sorted on for 820 to confirm accuracy
df_820 <- colData(mutants.sce)[colData(mutants.sce)$label.genetic_background =="PBANKA_820", ]
table(df_820$stage_pred, df_820$label.fluoresence_sorted_on)
               
                GFP Hoechst mCherry
  Asexual_Early   1      81       2
  Asexual_Late    3     153       3
  Bipotential     0       1       0
  Female          2      83      52
  Male          156      47       0
  unassigned     55     554     200
## see how this overlaps with fluorescence sorted on for 820 to confirm accuracy
df_820 <- colData(mutants.sce)[colData(mutants.sce)$label.genetic_background =="PBANKA_820", ]
table(df_820$stage_pred, df_820$label.fluoresence_sorted_on)
table(colData(mutants.sce)$stage_pred, colData(mutants.sce)$label.identity_name_updated)

make final plot with MCA data below

## Read in MCA data
pheno <- read.csv("../scmap/allpb10x_pheno.csv")
## extract rows needed for plotting
df_plot <- pheno[,c("PC2_3d", "PC3_3d", "absclust3")]
df_plot$experiment <- df_plot$absclust3

## extract rows needed for plotting from mapped object
df_plot_pm <- as.data.frame(colData(pm_ss2_field.sce.orth))
df_plot_pm <- df_plot_pm[ ,c("X_snap", "Y_snap", "label.absclust3")]
df_plot_pm$experiment <- "pm"
colnames(df_plot_pm) <- c("PC2_3d", "PC3_3d", "absclust3", "experiment")

## extract rows needed for plotting from mapped object
df_plot_pf <- as.data.frame(colData(pf_ss2_field.sce.orth))
df_plot_pf <- df_plot_pf[ ,c("X_snap", "Y_snap", "label.absclust3")]
df_plot_pf$experiment <- "pf"
colnames(df_plot_pf) <- c("PC2_3d", "PC3_3d", "absclust3", "experiment")

## bind together
df_plot <- rbind(df_plot, df_plot_pf, df_plot_pm)

colors <- c("6"="#78C679",
            "2"="#D1EC9F",
            "0"="#FEB24C",
            "1"="#F4CF63",
            "3"="#FEEEAA",
            "4"="#85B1D3",
            "7"="#9ecae1",
            "5"="#C9E8F1",
            "M"= "#B7B7D8",
            "F"="#9C96C6",
            "unassigned"="black",
            "pm" = "#dc65a4",
            "pf" = "#24245f")

## add alpha for plotting
df_plot$alpha[!df_plot$experiment == "pm"] <- 0.5
df_plot$alpha[df_plot$experiment == "pm"] <- 1
df_plot$alpha[df_plot$experiment == "pf"] <- 1

##plot
scmap_pca <- ggplot(df_plot, aes(x=PC2_3d, y = PC3_3d,colour=experiment, alpha = alpha)) + 
  geom_point() +
  theme_void() +
  scale_color_manual(values = colors) +
  ## remove alpha scale
  scale_alpha_continuous(guide=FALSE)
  ## draw ring around field samples
  #geom_point(data=df_plot[df_plot$experiment == "pm", ], pch=21, fill=NA, size=2, colour="black", stroke=1) +
  ## make non-field samples more opaque
  #geom_point(data=df_plot[-df_plot$experiment == "pm", ], alpha = 0.2)

## view
scmap_pca_2 <- scmap_pca + 
  guides(colour=guide_legend(override.aes = list(size=4)))

scmap_pca_2

save

#ggsave("../images_to_export/field_samples_scmap.png", plot = scmap_pca_2, device = "png", path = NULL, scale = 1, width = 15, height = 10, units = "cm", dpi = 300, limitsize = TRUE)

Overlap and validation

11. Save and Export

save

Save environment

## This saves everything in the global environment for easy recall later
#save.image(file = "GCSKO_merge.RData")
#load(file = "GCSKO_merge.RData")

Save object(s)

## Save an object to a file
saveRDS(tenx.justwt.integrated.sex, file = "../data_to_export/tenx.justwt.integrated.sex.RDS")
## Restore the object
#readRDS(file = "../data_to_export/tenx.mutant.integrated.sex.RDS")

## save integrated object to file
saveRDS(tenx.justwt.integrated, file = "../data_to_export/tenx.justwt.integrated.RDS") 
## restore the object
#tenx.justwt.integrated <- readRDS("../data_to_export/tenx.justwt.integrated.RDS")

## save monocle object
saveRDS(monocle.object.all, file = "../data_to_export/monocle.object.all.RDS") 
## restore the object
#monocle.object.all <- readRDS("../data_to_export/monocle.object.all.RDS")

## save integrated object to file
#saveRDS(GCSKO_mutants, file = "../data_to_export/GCSKO_mutants.RDS") 
## restore the object
#tenx.justwt.integrated <- readRDS("../data_to_export/tenx.justwt.integrated.RDS")

## extract UMAP coordinates and metadata
## extract metadata
mca_website_df <- tenx.justwt.integrated@meta.data

## add umap - merge on row names and then make sure row names are not dropped or added
mca_website_df <- transform(merge(mca_website_df, tenx.justwt.integrated@reductions[["umap"]]@cell.embeddings, by=0, all=TRUE), row.names=Row.names, Row.names=NULL)

## send this to sunil:
# ggplot(mca_website_df, aes(x = UMAP_1, y = UMAP_2)) +
# geom_point(col = mca_website_df$pt_id_cols) +
# theme_void() +
# coord_fixed()

## export
write.csv(mca_website_df, file = "../data_to_export/mca_website_df.csv")

Clean up

#rm(ss2_wt_cells)
#rm(tenx.justwt.integrated)
#rm(tenx.justwt.list)

final figure construction

Cowplot(plot_grid), patchwork(wrap_plots), and ggpubr can all allow multiple plots to be plotted together.

## A
# umap_id_pt
## B
# marker gene expression
## C
# Mutant gene expression
## D
# Modules

#Figure_A <- grid.arrange(arrangeGrob(QC_composite_plot, QC_mito_violin, QC_mito_graph, QC_by_genotype, mapping_rate_plot), nrow=3), nrow=2, heights=c(10,2))

## cowplot method
## can use this for labels: toupper(letters)[1:10]

## C. Mutant genes
mutant_genes_figure <- plot_grid(
                                 ## marker genes starts
                                 marker_gene_plot_FAMB,
                                 marker_gene_plot_MSP8,
                                 marker_gene_plot_MSP1,
                                 marker_gene_plot_AP2G,
                                 marker_gene_plot_CCP2,
                                 marker_gene_plot_MG1,
                                 ## mutant genes starts  
                                 marker_gene_plot_gd1,
                                 marker_gene_plot_md1,
                                 marker_gene_plot_md2,
                                 marker_gene_plot_md3,
                                 marker_gene_plot_md4, 
                                 marker_gene_plot_md5,
                                 marker_gene_plot_fd1,
                                 marker_gene_plot_fd2,
                                 marker_gene_plot_fd3,
                                 marker_gene_plot_fd4, 
  label_size = 1, 
  nrow = 4)

## labels as uppercase letters:
#c(toupper(letters)[2:17])
# labels as roman numerals:
# c(tolower(as.roman(c(2:17))

Figure_publication <- plot_grid(umap_id_pt + theme(plot.margin = unit(c(0, 0, 0, 0), "cm")), 
                                mutant_genes_figure,
                                ## add empty plot to give spacing
                                ggplot() + theme_void(),
                                labels = c('A', 'B'), 
                                label_size = 12, 
                                ncol = 2, 
                                nrow=2, 
                                rel_heights = c(1, 1, 4), 
                                rel_widths = c(1, 2, 3))

Figure_publication

save

ggsave("../images_to_export/Figure_C.png", plot = Figure_publication, device = "png", path = NULL, scale = 1, width = 21, height = 29.7, units = "cm", dpi = 300, limitsize = TRUE)
layout <- "
AAABBB
CCDDEE
FFGGHH
IIJJKK
"

Figure_publication_A <- (composition_umap_10x + theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 10))) +
(composition_umap_ss2 + theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 10))) + 
plot_layout(ncol = 2)

Figure_publication <- Figure_publication_A +
#(UMAP_hoo + theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 10)) + labs(title="Predicted Asexual Cycle Real Timepoint")) +
#(UMAP_kasia + theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 10)) + labs(title="Predicted Gametocytogenesis Real Timepoint")) +
  ## marker genes
marker_gene_plot_FAMB +
marker_gene_plot_MSP8 +
marker_gene_plot_MSP1 +
marker_gene_plot_CDPK5 +
marker_gene_plot_AP2G +
marker_gene_plot_CCP2 +
marker_gene_plot_MG1 +
marker_gene_plot_ORC1 +
marker_gene_plot_MCM4 +
  ## mutant genes starts  
#marker_gene_plot_gd1 +
#marker_gene_plot_md1 +
#marker_gene_plot_md2 +
#marker_gene_plot_md3 +
#marker_gene_plot_md4 + 
#marker_gene_plot_md5 +
#marker_gene_plot_fd1 +
#marker_gene_plot_fd2 +
#marker_gene_plot_fd3 +
#marker_gene_plot_fd4 + 
plot_layout(ncol = 6, 
            widths = c(1, 1, 1, 1, 1 ,1),
            heights = c(2, 1, 1, 1),
            design = layout
            )

Figure_publication

save

sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
 [1] stats4    parallel  grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] colourvalues_0.3.7          readxl_1.3.1                destiny_3.2.0               circlize_0.4.12            
 [5] gganimate_1.0.7             shiny_1.5.0                 monocle3_0.2.3.0            SingleCellExperiment_1.10.1
 [9] SummarizedExperiment_1.18.2 DelayedArray_0.14.1         matrixStats_0.57.0          GenomicRanges_1.40.0       
[13] GenomeInfoDb_1.24.2         IRanges_2.22.2              S4Vectors_0.26.1            Biobase_2.48.0             
[17] BiocGenerics_0.34.0         Nebulosa_1.2.0              reshape2_1.4.4              Hmisc_4.4-1                
[21] Formula_1.2-4               survival_3.2-7              lattice_0.20-41             gridExtra_2.3              
[25] dplyr_1.0.2                 patchwork_1.0.1             ggplot2bdc_0.3.2            cowplot_1.1.0              
[29] ggpubr_0.4.0                ggplot2_3.3.2               viridis_0.5.1               viridisLite_0.3.0          
[33] Seurat_3.2.2                colorspace_1.4-1           

loaded via a namespace (and not attached):
  [1] ggthemes_4.2.4            coda_0.19-4               tidyr_1.1.2               knitr_1.30               
  [5] irlba_2.3.3               data.table_1.13.2         rpart_4.1-15              RCurl_1.98-1.2           
  [9] generics_0.0.2            callr_3.5.1               leidenbase_0.1.2          usethis_1.6.3            
 [13] RANN_2.6.1                proxy_0.4-24              future_1.19.1             spatstat.data_1.4-3      
 [17] httpuv_1.5.4              assertthat_0.2.1          gifski_0.8.6              xfun_0.18                
 [21] hms_0.5.3                 evaluate_0.14             promises_1.1.1            DEoptimR_1.0-8           
 [25] fansi_0.4.1               progress_1.2.2            DBI_1.1.0                 igraph_1.2.6             
 [29] htmlwidgets_1.5.2         spdep_1.1-5               purrr_0.3.4               ellipsis_0.3.1           
 [33] RSpectra_0.16-0           crosstalk_1.1.0.1         ks_1.12.0                 backports_1.1.10         
 [37] deldir_0.1-29             vctrs_0.3.4               TTR_0.24.2                remotes_2.2.0            
 [41] ROCR_1.0-11               abind_1.4-5               RcppEigen_0.3.3.7.0       withr_2.3.0              
 [45] grr_0.9.5                 robustbase_0.93-7         checkmate_2.0.0           vcd_1.4-8                
 [49] sctransform_0.3.1         xts_0.12.1                prettyunits_1.1.1         mclust_5.4.7             
 [53] goftest_1.2-2             cluster_2.1.0             lazyeval_0.2.2            laeken_0.5.1             
 [57] crayon_1.3.4              units_0.6-7               slam_0.1-47               pkgconfig_2.0.3          
 [61] labeling_0.4.2            tweenr_1.0.1              nlme_3.1-149              pkgload_1.1.0            
 [65] nnet_7.3-14               devtools_2.3.2            rlang_0.4.8               globals_0.13.1           
 [69] lifecycle_0.2.0           miniUI_0.1.1.1            rsvd_1.0.3                cellranger_1.1.0         
 [73] rprojroot_1.3-2           polyclip_1.10-0           RcppHNSW_0.3.0            lmtest_0.9-38            
 [77] Matrix_1.2-18             raster_3.3-13             carData_3.0-4             Matrix.utils_0.9.8       
 [81] boot_1.3-25               zoo_1.8-8                 base64enc_0.1-3           ggridges_0.5.2           
 [85] GlobalOptions_0.1.2       processx_3.4.4            pheatmap_1.0.12           png_0.1-7                
 [89] bitops_1.0-6              KernSmooth_2.23-17        DelayedMatrixStats_1.10.1 classInt_0.4-3           
 [93] shape_1.4.5               stringr_1.4.0             jpeg_0.1-8.1              rstatix_0.6.0            
 [97] ggsignif_0.6.0            scales_1.1.1              memoise_1.1.0             magrittr_2.0.1           
[101] plyr_1.8.6                hexbin_1.28.1             ica_1.0-2                 gdata_2.18.0             
[105] zlibbioc_1.34.0           compiler_4.0.3            RColorBrewer_1.1-2        pcaMethods_1.80.0        
[109] fitdistrplus_1.1-1        cli_2.1.0                 LearnBayes_2.15.1         XVector_0.28.0           
[113] listenv_0.8.0             pbapply_1.4-3             ps_1.4.0                  htmlTable_2.1.0          
[117] ggplot.multistats_1.0.0   MASS_7.3-53               mgcv_1.8-33               tidyselect_1.1.0         
[121] stringi_1.5.3             forcats_0.5.0             yaml_2.2.1                latticeExtra_0.6-29      
[125] ggrepel_0.8.2             pbmcapply_1.5.0           tools_4.0.3               future.apply_1.6.0       
[129] rio_0.5.16                rstudioapi_0.11           foreign_0.8-80            smoother_1.1             
[133] scatterplot3d_0.3-41      farver_2.0.3              Rtsne_0.15                digest_0.6.27            
[137] BiocManager_1.30.10       Rcpp_1.0.6                car_3.0-10                broom_0.7.2              
[141] later_1.1.0.1             RcppAnnoy_0.0.16          httr_1.4.2                sf_0.9-6                 
[145] fs_1.5.0                  tensor_1.5                ranger_0.12.1             reticulate_1.18          
[149] splines_4.0.3             uwot_0.1.8                expm_0.999-5              spatstat.utils_1.17-0    
[153] sp_1.4-4                  spData_0.3.8              plotly_4.9.2.1            sessioninfo_1.1.1        
[157] xtable_1.8-4              jsonlite_1.7.1            spatstat_1.64-1           testthat_2.3.2           
[161] R6_2.5.0                  gmodels_2.18.1            pillar_1.4.6              htmltools_0.5.1.1        
[165] mime_0.9                  glue_1.4.2                fastmap_1.0.1             VIM_6.1.0                
[169] class_7.3-17              codetools_0.2-16          utf8_1.1.4                pkgbuild_1.1.0           
[173] mvtnorm_1.1-1             tibble_3.0.4              curl_4.3                  leiden_0.3.3             
[177] gtools_3.8.2              zip_2.1.1                 openxlsx_4.2.2            limma_3.44.3             
[181] rmarkdown_2.5             desc_1.2.0                munsell_0.5.0             e1071_1.7-4              
[185] GenomeInfoDbData_1.2.3    haven_2.3.1               gtable_0.3.0             

Appendix

Session Info

Expression plots

Expression - raw - Viridis

## find a good ring marker, to see if there is a better one than the ones reported
#markers_ring <- FindMarkers(tenx.justwt.integrated, ident.1 = c("4", "5", "16", "11", "7", "3", "9", "0", "22"))
#head(markers_ring)

# PBANKA-1319500 - CCP2 - female - used in 820 line
# PBANKA-0416100 - MG1 - dynenin heavy chain - male - used in 820 line
# PBANKA-1437500 - AP2G - commitment
# PBANKA-0831000 - MSP1 - late asexual
# PBANKA-1102200 - MSP8 - early asexual (from Bozdech paper)
# PBANKA-0711900 - HSP70 - promoter used for GFP and RFP expression in the mutants
# PBANKA-1400400 - FAMB - ring marker - discovered by looking for marker genes in data
# PBANKA-0722600 - Fam-b2 - ring marker - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5113031/ 


marker_gene_plot_CCP2 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-1319500", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste(italic(ccp2), "(Female)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))

marker_gene_plot_MG1 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-0416100", coord.fixed = TRUE, 
                                    #min.cutoff = "q1", 
                                    dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste(italic(mg1), "(Male)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))

marker_gene_plot_AP2G <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-1437500", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste(italic(ap2g), "(Commitment)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))

marker_gene_plot_MSP1 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-0831000", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste(italic(msp1), "(Schizont)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))

marker_gene_plot_MSP8 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-1102200", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste(italic(msp8), "(Asexual)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))

marker_gene_plot_SBP1 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-1101300", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste(italic(sbp1), "(Ring)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))

marker_gene_plot_FAMB <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-0722600", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste(italic(fam-b2),  "(Ring)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))

marker_gene_plot_HSP70 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-0711900", coord.fixed = TRUE, 
                                      #min.cutoff = "q1", 
                                      dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste(italic(HSP70),  "(Reporter)","\n", "PBANKA_0711900")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))

##original label:
# labs(title = paste("(CCP2; Female)","\n", "PBANKA_1319500"))

## plot
## cowplot method
marker_gene_plot_all <- plot_grid(marker_gene_plot_FAMB, marker_gene_plot_MSP8, marker_gene_plot_MSP1, marker_gene_plot_AP2G, marker_gene_plot_CCP2, marker_gene_plot_MG1, marker_gene_plot_HSP70, nrow=3)

marker_gene_plot_all

## patchwork method
#marker_gene_plot_FAMB + marker_gene_plot_MSP8 + marker_gene_plot_MSP1 + marker_gene_plot_AP2G + marker_gene_plot_CCP2 + marker_gene_plot_MG1 + marker_gene_plot_HSP70

Expression - raw - purple


marker_gene_ramp <- colorRampPalette(c("#D3D3D3", "#1D1564"))(50)

marker_gene_plot_CCP2 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-1319500", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("CCP2 (Female)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=marker_gene_ramp)

marker_gene_plot_MG1 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-0416100", coord.fixed = TRUE, 
                                    #min.cutoff = "q1", 
                                    dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("MG1 (Male)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=marker_gene_ramp)

marker_gene_plot_AP2G <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-1437500", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("AP2G (Commitment)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=marker_gene_ramp)

marker_gene_plot_MSP1 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-0831000", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("MSP1 (Schizont)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=marker_gene_ramp)

marker_gene_plot_MSP8 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-1102200", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("MSP8 (Asexual)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=marker_gene_ramp)

marker_gene_plot_SBP1 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-1101300", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("SBP1 (Ring)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=marker_gene_ramp)

marker_gene_plot_FAMB <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-0722600", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("Fam-b2 (Ring)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=marker_gene_ramp)

marker_gene_plot_HSP70 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-0711900", coord.fixed = TRUE, 
                                      #min.cutoff = "q1", 
                                      dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("(HSP70; Reporter)","\n", "PBANKA_0711900")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=marker_gene_ramp)

## plot
## cowplot method
marker_gene_plot_all <- plot_grid(marker_gene_plot_FAMB, marker_gene_plot_MSP8, marker_gene_plot_MSP1, marker_gene_plot_AP2G, marker_gene_plot_CCP2, marker_gene_plot_MG1, marker_gene_plot_HSP70, nrow=3)

marker_gene_plot_all

Expression - data - purple

# PBANKA-1418100        GCSKO-17  FD3   
# PBANKA-0102400         GCSKO-2  MD3 
# PBANKA-0716500        GCSKO-19  MD4 
# PBANKA-1435200        GCSKO-20  FD4 
# PBANKA-0902300        GCSKO-13  FD2
# PBANKA-0413400    GCSKO-10_820  MD5
# PBANKA-0828000         GCSKO-3  GD1
# PBANKA-1302700       GCSKO-oom  MD1 
# PBANKA-1447900        GCSKO-29  MD2
# PBANKA-1454800        GCSKO-21  FD1
# PBANKA-1144800        GCSKO-28  FD5


marker_gene_plot_17 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-1418100", coord.fixed = TRUE,  pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("fd3")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_2 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-0102400", coord.fixed = TRUE,  pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("md3")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_19 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-0716500", coord.fixed = TRUE,  pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("md4")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_20 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-1435200", coord.fixed = TRUE,  pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("fd4")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_13 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-0902300", coord.fixed = TRUE,  pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("fd2")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_10 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-0413400", coord.fixed = TRUE,  pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("md5")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_3 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-0828000", coord.fixed = TRUE,  pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("gd1")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_oom <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-1302700", coord.fixed = TRUE,  pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("md1")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_29 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-1447900", coord.fixed = TRUE,  pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("md2")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_21 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-1454800", coord.fixed = TRUE,  pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("fd1")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

##original label:
# labs(title = paste("(CCP2; Female)","\n", "PBANKA_1319500"))

## make composite plot
mutant_expression_composite <- wrap_plots(marker_gene_plot_17 , marker_gene_plot_2 , marker_gene_plot_19 , marker_gene_plot_20 , marker_gene_plot_13 , marker_gene_plot_10 , marker_gene_plot_3 , marker_gene_plot_oom , marker_gene_plot_29 , marker_gene_plot_21 , ncol = 4)
           
## print
mutant_expression_composite

Density Plots

Density - purple

# PBANKA-1319500 - CCP2 - female - used in 820 line
# PBANKA-0416100 - MG1 - dynenin heavy chain - male - used in 820 line
# PBANKA-1437500 - AP2G - commitment
# PBANKA-0831000 - MSP1 - late asexual
# PBANKA-1102200 - MSP8 - early asexual (from Bozdech paper)
# PBANKA-0711900 - HSP70 - promoter used for GFP and RFP expression in the mutants
# PBANKA-1400400 - FAMB - ring marker - discovered by looking for marker genes in data
# PBANKA-0722600 - Fam-b2 - ring marker - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5113031/ 

markers_list <- c("PBANKA-0828000", "PBANKA-1302700", "PBANKA-1447900", "PBANKA-0102400", "PBANKA-0716500", "PBANKA-0413400", "PBANKA-1454800", "PBANKA-0902300", "PBANKA-1418100", "PBANKA-1435200")

list_of_density_plots_mutant_genes <- plot_density(tenx.justwt.integrated, markers_list, joint = FALSE, combine = FALSE, dims = c(1,2), pal = "plasma", method = "ks")

## make composite plot
UMAP_composite_mutant_genes <- wrap_plots(list_of_density_plots_mutant_genes[[1]] + 
                                            coord_fixed() + 
                                            theme_void() + 
                                            labs(title = paste("gd1")) + 
                                            scale_colour_gradientn(colours=marker_gene_ramp) + 
                                            guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[2]] + 
                               coord_fixed() + theme_void() + 
                               labs(title = paste("md1")) + 
                               scale_colour_gradientn(colours=marker_gene_ramp) + 
                               guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[3]] + coord_fixed() + theme_void() + labs(title = paste("md2")) + scale_colour_gradientn(colours=marker_gene_ramp) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[4]] + coord_fixed() + theme_void() + labs(title = paste("md3")) + scale_colour_gradientn(colours=marker_gene_ramp) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[5]] + coord_fixed() + theme_void() + 
  labs(title = paste("md4")) + scale_colour_gradientn(colours=marker_gene_ramp) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[6]] + coord_fixed() + theme_void() + labs(title = paste("md5")) + scale_colour_gradientn(colours=marker_gene_ramp) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[7]] + coord_fixed() + theme_void() + labs(title = paste("fd1")) + scale_colour_gradientn(colours=marker_gene_ramp) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[8]] + coord_fixed() + theme_void() + labs(title = paste("fd2")) + scale_colour_gradientn(colours=marker_gene_ramp) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[9]] + coord_fixed() + theme_void() + labs(title = paste("fd3")) + scale_colour_gradientn(colours=marker_gene_ramp) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[10]] + coord_fixed() + theme_void() + labs(title = paste("fd4")) + scale_colour_gradientn(colours=marker_gene_ramp) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             ncol = 4)

UMAP_composite_mutant_genes

Density -viridis

# PBANKA-1319500 - CCP2 - female - used in 820 line
# PBANKA-0416100 - MG1 - dynenin heavy chain - male - used in 820 line
# PBANKA-1437500 - AP2G - commitment
# PBANKA-0831000 - MSP1 - late asexual
# PBANKA-1102200 - MSP8 - early asexual (from Bozdech paper)
# PBANKA-0711900 - HSP70 - promoter used for GFP and RFP expression in the mutants
# PBANKA-1400400 - FAMB - ring marker - discovered by looking for marker genes in data
# PBANKA-0722600 - Fam-b2 - ring marker - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5113031/ 

markers_list <- c("PBANKA-0828000", "PBANKA-1302700", "PBANKA-1447900", "PBANKA-0102400", "PBANKA-0716500", "PBANKA-0413400", "PBANKA-1454800", "PBANKA-0902300", "PBANKA-1418100", "PBANKA-1435200")

list_of_density_plots_mutant_genes <- plot_density(tenx.justwt.integrated, markers_list, joint = FALSE, combine = FALSE, dims = c(1,2), pal = "plasma", method = "ks")

## make composite plot
UMAP_composite_mutant_genes <- wrap_plots(list_of_density_plots_mutant_genes[[1]] + 
                                            coord_fixed() + 
                                            theme_void() + 
                                            labs(title = paste("gd1")) + 
                                            scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + 
                                            guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[2]] + 
                               coord_fixed() + theme_void() + 
                               labs(title = paste("md1")) + 
                               scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + 
                               guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[3]] + coord_fixed() + theme_void() + labs(title = paste("md2")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[4]] + coord_fixed() + theme_void() + labs(title = paste("md3")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)) )+ guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[5]] + coord_fixed() + theme_void() + 
  labs(title = paste("md4")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[6]] + coord_fixed() + theme_void() + labs(title = paste("md5")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[7]] + coord_fixed() + theme_void() + labs(title = paste("fd1")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[8]] + coord_fixed() + theme_void() + labs(title = paste("fd2")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[9]] + coord_fixed() + theme_void() + labs(title = paste("fd3")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[10]] + coord_fixed() + theme_void() + labs(title = paste("fd4")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             ncol = 4)

UMAP_composite_mutant_genes

Density - viridis

# PBANKA-0828000         GCSKO-3  GD1

# PBANKA-1302700       GCSKO-oom  MD1 
# PBANKA-1447900        GCSKO-29  MD2
# PBANKA-0102400         GCSKO-2  MD3 
# PBANKA-0716500        GCSKO-19  MD4 
# PBANKA-0413400    GCSKO-10_820  MD5

# PBANKA-1454800        GCSKO-21  FD1
# PBANKA-0902300        GCSKO-13  FD2
# PBANKA-1418100        GCSKO-17  FD3   
# PBANKA-1435200        GCSKO-20  FD4 

markers_list <- c("PBANKA-0828000", "PBANKA-1302700", "PBANKA-1447900", "PBANKA-0102400", "PBANKA-0716500", "PBANKA-0413400", "PBANKA-1454800", "PBANKA-0902300", "PBANKA-1418100", "PBANKA-1435200")

list_of_density_plots_mutant_genes <- plot_density(tenx.justwt.integrated, markers_list, joint = FALSE, combine = FALSE, dims = c(1,2), pal = "plasma", method = "ks")

## make composite plot
UMAP_composite_mutant_genes <- wrap_plots(list_of_density_plots_mutant_genes[[1]] + 
                                            coord_fixed() + 
                                            theme_void() + 
                                            labs(title = paste("gd1")) + 
                                            scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + 
                                            guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[2]] + 
                               coord_fixed() + theme_void() + 
                               labs(title = paste("md1")) + 
                               scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + 
                               guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[3]] + coord_fixed() + theme_void() + labs(title = paste("md2")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[4]] + coord_fixed() + theme_void() + labs(title = paste("md3")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)) )+ guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[5]] + coord_fixed() + theme_void() + 
  labs(title = paste("md4")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[6]] + coord_fixed() + theme_void() + labs(title = paste("md5")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[7]] + coord_fixed() + theme_void() + labs(title = paste("fd1")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[8]] + coord_fixed() + theme_void() + labs(title = paste("fd2")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[9]] + coord_fixed() + theme_void() + labs(title = paste("fd3")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[10]] + coord_fixed() + theme_void() + labs(title = paste("fd4")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             ncol = 4)

UMAP_composite_mutant_genes

## IF putting in MAIN - list_of_density_plots_mutant_genes[[2]] + coord_fixed() + theme_void() + theme(plot.title = element_text(hjust = 0.5), legend.text = element_text(size = 8), text=element_text(size=9)) + labs(title = paste("md1")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.3, barheight = 4.0, title = ""))
# PBANKA-0828000         GCSKO-3  GD1

# PBANKA-1302700       GCSKO-oom  MD1 
# PBANKA-1447900        GCSKO-29  MD2
# PBANKA-0102400         GCSKO-2  MD3 
# PBANKA-0716500        GCSKO-19  MD4 
# PBANKA-0413400    GCSKO-10_820  MD5

# PBANKA-1454800        GCSKO-21  FD1
# PBANKA-0902300        GCSKO-13  FD2
# PBANKA-1418100        GCSKO-17  FD3   
# PBANKA-1435200        GCSKO-20  FD4 

markers_list <- c("PBANKA-0828000", "PBANKA-1302700", "PBANKA-1447900", "PBANKA-0102400", "PBANKA-0716500", "PBANKA-0413400", "PBANKA-1454800", "PBANKA-0902300", "PBANKA-1418100", "PBANKA-1435200")

list_of_density_plots_mutant_genes <- plot_density(tenx.justwt.integrated, markers_list, joint = FALSE, combine = FALSE, dims = c(1,2), pal = "plasma", method = "wkde", slot = 'data')

## make composite plot
UMAP_composite_mutant_genes <- wrap_plots(list_of_density_plots_mutant_genes[[1]] + 
                                            coord_fixed() + 
                                            theme_void() + 
                                            labs(title = paste("gd1")) + 
                                            scale_color_continuous_sequential(palette = "Purples 2") + 
                                            guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[2]] + 
                               coord_fixed() + theme_void() + 
                               labs(title = paste("md1")) + 
                               scale_color_continuous_sequential(palette = "Purples 2") + 
                               guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[3]] + 
                               coord_fixed() + 
                               theme_void() + 
                               labs(title = paste("md2")) + scale_color_continuous_sequential(palette = "Purples 2") + 
                               guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[4]] + coord_fixed() + theme_void() + labs(title = paste("md3")) + scale_color_continuous_sequential(palette = "Purples 2") + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[5]] + coord_fixed() + theme_void() + 
  labs(title = paste("md4")) + scale_color_continuous_sequential(palette = "Purples 2") + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[6]] + coord_fixed() + theme_void() + labs(title = paste("md5")) + scale_color_continuous_sequential(palette = "Purples 2") + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[7]] + coord_fixed() + theme_void() + labs(title = paste("fd1")) + scale_color_continuous_sequential(palette = "Purples 2") + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[8]] + coord_fixed() + theme_void() + labs(title = paste("fd2")) + scale_color_continuous_sequential(palette = "Purples 2") + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[9]] + coord_fixed() + theme_void() + labs(title = paste("fd3")) + scale_color_continuous_sequential(palette = "Purples 2") + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[10]] + coord_fixed() + theme_void() + labs(title = paste("fd4")) + scale_color_continuous_sequential(palette = "Purples 2") + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             ncol = 4)

## old colour scale: scale_colour_gradientn(colours=marker_gene_ramp )

UMAP_composite_mutant_genes
---
subtitle: 'Gametocyte Development in <i>Plasmodium berghei</i>'
title: |
  ![](../GCSKO_logo.jpg){width=300px}  
  Merging Smart-seq2 and 10X Datasets - wild-type only
author: "[Andrew Russell](https://ajcrussell.wixsite.com/mysite/about)"
institute: Wellcome Sanger Institute
date: '`r format(Sys.Date(), "%B %d, %Y")`'
output:
  html_notebook:
    theme: cosmo
    toc: yes
    toc_depth: 3
    #toc_float: yes
    df_print: paged
---
***
# 1. Introduction and Aims {.tabset}

We have quality-controlled the 10X data and the SS2 data and now are left with the following objects:

10X 5K data - pb_sex_filtered

10X 30K data - pb_30k_sex_filtered 

SS2 mutant data - ss2_mutants_final

# 2. Read in the data  {.tabset}

### Load/Install the Required Packages

```{r load packages, echo = FALSE}
## CRAN packages

## Pathwork is needed to stich plots together using '+'
if(require("patchwork", quietly = TRUE)){
    print("patchwork is loaded correctly")
} else {
    print("trying to install patchwork")
    install.packages("patchwork")
    if(require(patchwork)){
        print("patchwork installed and loaded")
    } else {
        stop("could not install patchwork")
    }
}

## viridis allows different colours to be added to plots
if(require("viridis", quietly = TRUE)){
    print("viridis is loaded correctly")
} else {
    print("trying to install viridis")
    install.packages("viridis")
    if(require(viridis)){
        print("viridis installed and loaded")
    } else {
        stop("could not install viridis")
    }
}

## Seurat is needed for most of this script
if(require("Seurat", quietly = TRUE)){
    print("Seurat is loaded correctly")
} else {
    print("trying to install Seurat")
    install.packages("Seurat")
    if(require(Seurat)){
        print("Seurat installed and loaded")
    } else {
        stop("could not install Seurat")
    }
}

## cowplot is needed for plots in this script
if(require("cowplot")){
    print("cowplot is loaded correctly")
} else {
    print("trying to install cowplot")
    install.packages("cowplot")
    if(require(cowplot)){
        print("cowplot installed and loaded")
    } else {
        stop("could not install cowplot")
    }
}

## gridExtra is needed for grid graphics to plot multiple plots in the same view
if(require("gridExtra")){
    print("gridExtra is loaded correctly")
} else {
    print("trying to install gridExtra")
    install.packages("gridExtra")
    if(require(gridExtra)){
        print("gridExtra installed and loaded")
    } else {
        stop("could not install gridExtra")
    }
}

## grid is needed for grid.arrange function to change size of title
if(require("grid")){
    print("grid is loaded correctly")
} else {
    print("trying to install grid")
    install.packages("grid")
    if(require(grid)){
        print("grid installed and loaded")
    } else {
        stop("could not install grid")
    }
}

##for doing bulk correlation calculations
if(require("Hmisc")){
    print("Hmisc is loaded correctly")
} else {
    print("trying to install Hmisc")
    install.packages("Hmisc")
    if(require(Hmisc)){
        print("Hmisc installed and loaded")
    } else {
        stop("could not install Hmisc")
    }
}

## reshape2 to melt dataframes for plotting:
if(require("reshape2")){
    print("reshape2 is loaded correctly")
} else {
    print("trying to install reshape2")
    install.packages("reshape2")
    if(require(reshape2)){
        print("reshape2 installed and loaded")
    } else {
        stop("could not install reshape2")
    }
}

## to work with data frames:
if(require("dplyr")){
    print("dplyr is loaded correctly")
} else {
    print("trying to install dplyr")
    install.packages("dplyr")
    if(require(dplyr)){
        print("dplyr installed and loaded")
    } else {
        stop("could not install dplyr")
    }
}

## non-CRAN packages

## to make density plots showing gene expression
if(require("Nebulosa")){
    print("Nebulosa is loaded correctly")
} else {
    print("trying to install Nebulosa")
    devtools::install_github("powellgenomicslab/Nebulosa")
    if(require(Nebulosa)){
        print("Nebulosa installed and loaded")
    } else {
        stop("could not install Nebulosa")
    }
}

## monocle3 to calculate pseudotime:
if(require("monocle3")){
    print("monocle3 is loaded correctly")
} else {
    print("Please install monocle3 (https://cole-trapnell-lab.github.io/monocle3/docs/installation/)")
}

## set the seed for both the mixture models and also for the sample function later on:
set.seed(-92497)
```

### Read in the Data

screen hits
```{r}
## EDIT - change this to the excel table once we have it finalized for the screen
screen_hits <- c("PBANKA-0516300",
"PBANKA-1217700",
"PBANKA-0409100",
"PBANKA-1034300",
"PBANKA-1437500",
"PBANKA-0827500",
"PBANKA-0824300",
"PBANKA-1426900",
"PBANKA-0105300",
"PBANKA-0921100",
"PBANKA-1002400",
"PBANKA-0829400",
"PBANKA-1347200",
"PBANKA-0828000",
"PBANKA-0902300",
"PBANKA-1418100",
"PBANKA-1435200",
"PBANKA-1454800",
"PBANKA-0712300",
"PBANKA-0410500",
"PBANKA-1144800",
"PBANKA-1231600",
"PBANKA-0503200",
"PBANKA-0308900",
"PBANKA-1214700",
"PBANKA-0709900",
"PBANKA-0311900",
"PBANKA-0716500",
"PBANKA-1447900",
"PBANKA-0102200",
"PBANKA-0713500",
"PBANKA-0102400",
"PBANKA-1302700",
"PBANKA-1235900",
"PBANKA-0401100",
"PBANKA-0413400",
"PBANKA-1126900",
"PBANKA-1425900",
"PBANKA-0418300",
"PBANKA-1464600",
"PBANKA-0806000")
```

Read in gene annotations
```{r}
gene_annotations <- read.table("../data/Reference/GenesByTaxon_Summary.csv", header = TRUE, sep = ",", stringsAsFactors = TRUE)
dim(gene_annotations)

## convert _ to -
gene_annotations$Gene.ID <- gsub("_", "-", gene_annotations$Gene.ID)
```

load in datasets
```{r}
## load the 10X dataset
pb_sex_filtered <- readRDS("../data_to_export/pb_sex_filtered.RDS")
## load the SS2 dataset
ss2_mutants_final <- readRDS("../data_to_export/ss2_mutants_final.RDS")

## inspect
paste("10x dataset")
pb_sex_filtered
paste("Smart-seq2 dataset")
ss2_mutants_final
paste("The composition of the Smart-seq2 dataset is:")
table(ss2_mutants_final@meta.data$genotype)
```

# 3. Merging the Smart-seq2 and 10X Data {.tabset}

### Prepare data

```{r integration 10x setup}
## extract 10x data
tenx_5k_counts <- as.matrix(pb_sex_filtered@assays$RNA@counts)
tenx_5k_pheno <- pb_sex_filtered@meta.data

## Create fresh object
tenx_5k_counts_to_integrate <- CreateSeuratObject(counts = tenx_5k_counts, meta.data = tenx_5k_pheno, min.cells = 0, min.features = 0, project = "GCSKO")

## add experiment meta data
tenx_5k_counts_to_integrate@meta.data$experiment <- "tenx_5k"

## inspect
tenx_5k_counts_to_integrate
```

We need to make sure the mutant data is compatible with the 10X data. the 10X data has fewer genes represented so we need to find the intersect of the two before integration.
```{r integration ss2 setup}
## extract SS2 data 
mutant_counts_for_integration <- as.matrix(ss2_mutants_final@assays$RNA@counts)
mutant_pheno_for_integration <- ss2_mutants_final@meta.data

## change counts so the :rRNA and :tRNA are not there:
rownames(mutant_counts_for_integration) <- gsub(":ncRNA", "", gsub(":rRNA", "", gsub(":tRNA", "", rownames(mutant_counts_for_integration))))

## change the gene names so that they are - rather than _:
rownames(mutant_counts_for_integration) <- gsub("_", "-", rownames(mutant_counts_for_integration))

## calculate how many of the genes overlap - 10x does start out with 5098 vs 5245
genes_in_tenx_dataset <- intersect(rownames(tenx_5k_counts), rownames(mutant_counts_for_integration))
## print number of genes that overlap
dim(mutant_counts_for_integration)
## subset the mutant counts to contain only 10x genes
mutant_counts_for_integration <- mutant_counts_for_integration[which(rownames(mutant_counts_for_integration) %in% genes_in_tenx_dataset), ]
## print result of genes that overlap
dim(mutant_counts_for_integration)

## make Seurat object:
GCSKO_mutants <- CreateSeuratObject(counts = mutant_counts_for_integration, meta.data = mutant_pheno_for_integration, min.cells = 0, min.features = 0, project = "GCSKO")

## add experiment meta data
GCSKO_mutants@meta.data$experiment <- "mutants"

## inspect
GCSKO_mutants
```

```{r}
## double check that this is the same number of genes
## subset counts so that only genes represented in the other two objects are there:
length(intersect(rownames(tenx_5k_counts), rownames(mutant_counts_for_integration)))
```

IMPORTANT - this next step is different to GCSKO_merge as it subsets the smart-seq2 data into wild-type only. 
```{r}
## subset wt on ss2 data:
ss2_wt_cells <- rownames(GCSKO_mutants@meta.data[GCSKO_mutants@meta.data$genotype == "WT", ])
GCSKO_mutants_wtonly <- subset(GCSKO_mutants, cells = ss2_wt_cells)
```

create list and normalise:
```{r integration normalise}
## make list
tenx.justwt.list <- list(tenx_5k_counts_to_integrate, GCSKO_mutants_wtonly)

## prepare data
for (i in 1:length(tenx.justwt.list)) {
    tenx.justwt.list[[i]] <- NormalizeData(tenx.justwt.list[[i]], verbose = FALSE)
    tenx.justwt.list[[i]] <- FindVariableFeatures(tenx.justwt.list[[i]], selection.method = "vst", 
        nfeatures = 2000, verbose = FALSE)
    all.genes <- rownames(tenx.justwt.list[[i]])
    tenx.justwt.list[[i]] <- ScaleData(tenx.justwt.list[[i]], features = all.genes)
}
```

### Integrate objects

```{r integration}
## Find anchors
tenx.justwt.anchors <- FindIntegrationAnchors(object.list = tenx.justwt.list, dims = 1:21, verbose = FALSE)

## Integrate data
tenx.justwt.integrated <- IntegrateData(anchorset = tenx.justwt.anchors, dims = 1:21, verbose = FALSE, features.to.integrate = genes_in_tenx_dataset)
```

# 4. Dimensionality reduction {.tabset}

### PCA
```{r, fig.width = 10, fig.height = 5}
## Make the default assay integrated
DefaultAssay(tenx.justwt.integrated) <- "integrated"

## Run the standard workflow for visualization and clustering
tenx.justwt.integrated <- ScaleData(tenx.justwt.integrated, verbose = FALSE)
tenx.justwt.integrated <- RunPCA(tenx.justwt.integrated, npcs = 30, verbose = FALSE)

## inspect PCs
ElbowPlot(tenx.justwt.integrated, ndims = 30, reduction = "pca")
```

### Optimised UMAP
After optimisation, the following UMAP can be calculated:
```{r umap run 2, fig.height = 7, fig.width = 7}
## Run optimised UMAP
tenx.justwt.integrated <- RunUMAP(tenx.justwt.integrated, reduction = "pca", dims = 1:10, n.neighbors = 50, seed.use = 1234, min.dist = 0.4, repulsion.strength = 0.03, local.connectivity = 150)
```

```{r umap visualise 2}
## plot
dp1 <- DimPlot(tenx.justwt.integrated, label = TRUE, repel = FALSE, pt.size = 0.05, dims = c(1,2), split.by = "experiment") + 
  ## fix the axis
  coord_fixed() #+ 
  ## reverse the scale
  #scale_y_reverse()

## view
dp1
```

Make final plots:
```{r, fig.height = 10, fig.width = 10}

## split seurat object up
ob.list <- SplitObject(tenx.justwt.integrated, split.by = "experiment")

## make plots for each object
plot.list <- lapply(X = ob.list, FUN = function(x) {
    DimPlot(x, dims = c(1,2), pt.size = 1) + theme(legend.position="bottom")
})

composition_umap_10x <- plot.list$`tenx_5k` + 
  coord_fixed() +
  theme_void() +
  scale_color_manual(values=c(replicate(45, "#999999"))) +
  labs(title = paste("10x (wild-type)")) +
  theme(legend.position = "none", plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold"))

composition_umap_ss2 <- plot.list$mutants +
  coord_fixed() +
  theme_void() +
  scale_color_manual(values=c(replicate(46, "#999999"))) +
  labs(title = paste("Smart-seq2 (wild-type)")) +
  theme(legend.position = "none", plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold"))

composition_umap <- composition_umap_10x + composition_umap_ss2 

composition_umap
```

save
```{r}
ggsave("../images_to_export/composition_umap.png", plot = composition_umap, device = "png", path = NULL, scale = 1, width = 20, height = 20, units = "cm", dpi = 300, limitsize = TRUE)
```

```{r}
## make plots
## hoo dataset correlation
UMAP_hoo <- DimPlot(tenx.justwt.integrated, pt.size = 0.1, group.by = "Prediction.Spearman.", dims = c(1,2)) +
  coord_fixed() + 
  theme_void() +
  labs(title = paste("Hoo Predicted Timepoint")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) +
  scale_colour_manual(values = inferno(12))  +
  labs(colour = "hour") +
  theme(legend.position = "bottom", legend.title=element_text(size=10))

## ap2g timecourse in this paper correlation
UMAP_kasia <- DimPlot(tenx.justwt.integrated, pt.size = 0.1, group.by = "Prediction.Spearman._Kasia", dims = c(1,2)) +
  coord_fixed() + 
  theme_void() +
  labs(title = paste("AP2G Timecourse Predicted Timepoint")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) +
  scale_colour_manual(values = inferno(10))  +
  labs(colour = "hour") +
  theme(legend.position = "bottom", legend.title=element_text(size=10))

## combine
umap_bulk <- wrap_plots(UMAP_hoo, UMAP_kasia, ncol = 2)

## print
umap_bulk
```

## MCA Mapping

## Build the index

```{r}
## load in mca data
counts <- read.csv("../data/Reference/scmap/allpb10x_counts.csv", row.names = 1)
pheno <- read.csv("../data/Reference/scmap/allpb10x_pheno.csv")

## change dash
rownames(counts) <- gsub("_", "-", rownames(counts))

## load MCA pal
mca_colors <- c("6"="#78C679",
            "2"="#D1EC9F",
            "0"="#FEB24C",
            "1"="#F4CF63",
            "3"="#FEEEAA",
            "4"="#85B1D3",
            "7"="#9ecae1",
            "5"="#C9E8F1",
            "M"= "#B7B7D8",
            "F"="#9C96C6",
            "unassigned"="darkgrey")

## plot
ggplot(pheno, aes(x=PC2_3d, y = PC3_3d, colour=absclust3)) + 
  geom_point() +
  theme_classic() +
  scale_color_manual(values = mca_colors, labels = (c("0 - early troph", "1 - mid troph","2 - late ring", "3 - late troph","4 - early schizont", "5 - late schizont", "6 - early ring","7 - mid schizont", "female", "male", "unassigned")))
```

```{r}
###Making an ortholog reference index

## load required libraries
library(scmap)
library(SingleCellExperiment)

#prep the SCE, if was originally a Suerat object need the dfs to be regular matrices
#pb_filtered_sce_orth <- pb_filtered_sce_orth[, colData(pb_filtered_sce_orth)$absclust3 != "8"]
#sce <- pb_filtered_sce_orth
#pca <- plotPCA(sce)
#pcs <- pca$data
#table(rownames(pcs)==colnames(sce))
#colData(sce) <- cbind(colData(sce), pcs)
#rowData(sce)$feature_symbol <- rowData(sce)$gene
sce <- SingleCellExperiment(list(counts=counts),
    colData=DataFrame(label=pheno),
    rowData=DataFrame(feature_symbol=rownames(counts)))
sce
counts_1 <- assay(sce, "counts")
libsizes <- colSums(counts_1)
size.factors <- libsizes/mean(libsizes)
logcounts(sce) <- log2(t(t(counts_1)/size.factors) + 1)
#counts(sce) <- as.matrix(counts(sce))
#logcounts(sce) <- as.matrix(logcounts(sce))
# remove features with duplicated names
sce <- sce[!duplicated(rownames(sce)), ]

#build scmap-cell reference index, save this rds
sce <- selectFeatures(sce, suppress_plot = FALSE, n_features = 500)
table(rowData(sce)$scmap_features)

set.seed(1)
sce <- indexCell(sce)
names(metadata(sce)$scmap_cell_index)
length(metadata(sce)$scmap_cell_index$subcentroids)
dim(metadata(sce)$scmap_cell_index$subcentroids[[1]])
metadata(sce)$scmap_cell_index$subcentroids[[1]][,1:5]
dim(metadata(sce)$scmap_cell_index$subclusters)

#saveRDS(pb_filtered_sce_orth, file="pb_filtered_sce_orthindex_20181109.rds")
```

## Map to the index

```{r}
## convert seurat object above into sce for this
pf_field_counts <- as.matrix(tenx.justwt.integrated@assays$RNA@data)
pf_field_pheno <- tenx.justwt.integrated@meta.data

#prep the SCE, if was originally a Suerat object need the dfs to be regular matrices
#pb_filtered_sce_orth <- pb_filtered_sce_orth[, colData(pb_filtered_sce_orth)$absclust3 != "8"]
#sce <- pb_filtered_sce_orth
#pca <- plotPCA(sce)
#pcs <- pca$data
#table(rownames(pcs)==colnames(sce))
#colData(sce) <- cbind(colData(sce), pcs)
#rowData(sce)$feature_symbol <- rowData(sce)$gene
pf_live_field.sce.orth <- SingleCellExperiment(list(counts=pf_field_counts),
    colData=DataFrame(label=pf_field_pheno),
    rowData=DataFrame(feature_symbol=rownames(pf_field_counts)))
pf_live_field.sce.orth
counts_1 <- assay(pf_live_field.sce.orth, "counts")
libsizes <- colSums(counts_1)
size.factors <- libsizes/mean(libsizes)
logcounts(pf_live_field.sce.orth) <- log2(t(t(counts_1)/size.factors) + 1)


#Project query data set onto cell index
scmapCell_results <- scmapCell(
  pf_live_field.sce.orth, 
  list(mca = metadata(sce)$scmap_cell_index
  )
)

##Look into the results
# For each dataset there are two matricies. cells matrix contains the top 10 (scmap default) cell IDs of the cells of the reference dataset that a given cell of the projection dataset is closest to:
#   
#   Give assignments in two ways:
#   1. Take the top cell assignment abs clust, if cosine similarity is less than 0.4 (or adjust if needed) mark as unassigned
# 2. For the top 3 nearest neighbors, get a mean of the PCA coordinates and snap to the nearest cell of those coordinates. If any of the top three cells are sim below 0.4 then mark as unassigned.


##Top cell assignment method
scmapCell_results$mca$cells[, 1:3]
getcells <- scmapCell_results$mca$cells[1, ]
cdsce <- colData(sce)[getcells, ]
topsim <- scmapCell_results$mca$similarities[1, ]

pf_live_field.sce.orth$topcell <- cdsce$label.sample_id
pf_live_field.sce.orth$topcell_ac <- cdsce$label.absclust3
pf_live_field.sce.orth$indexPC1 <- cdsce$label.PC1
pf_live_field.sce.orth$indexPC2 <- cdsce$label.PC2
#pf_live_field.sce.orth$pbpt <- cdsce$pseudotime
#pf_live_field.sce.orth$pbbulk <- cdsce$bulk
pf_live_field.sce.orth$topcell_sp <- pf_live_field.sce.orth$topcell_ac
pf_live_field.sce.orth$topsim <- topsim
pf_live_field.sce.orth$topcell_sp[pf_live_field.sce.orth$topsim < 0.3] <- "unassigned"
table(pf_live_field.sce.orth$topcell_sp)
```

add these metrics back into the object 

```{r}
## add the meta data from scmap to the object
# live
## extract
df_plot_live <- data.frame(topcell_sp = colData(pf_live_field.sce.orth)$topcell_sp, topsim = colData(pf_live_field.sce.orth)$topsim, row.names = rownames(colData(pf_live_field.sce.orth)))

## add to object
tenx.justwt.integrated <- AddMetaData(
  object = tenx.justwt.integrated,
  metadata = df_plot_live,
  col.name = c('mca_topcell_sp', 'mca_topsim')
)

## need to fix the NA in topcell_sp
# Un-factorize (as.numeric can be use for numeric values)
#              (as.vector  can be use for objects - not tested)
tenx.justwt.integrated@meta.data$mca_topcell_sp <- as.character(tenx.justwt.integrated@meta.data$mca_topcell_sp)
## add unassigned
tenx.justwt.integrated@meta.data$mca_topcell_sp[is.na(tenx.justwt.integrated@meta.data$mca_topcell_sp)] <- "unassigned"
# Re-factorize with the as.factor function or simple factor(fixed$Type)
tenx.justwt.integrated@meta.data$mca_topcell_sp <- as.factor(tenx.justwt.integrated@meta.data$mca_topcell_sp)
## check
table(tenx.justwt.integrated@meta.data$mca_topcell_sp)
```

```{r}
## reorder levels
tenx.justwt.integrated@meta.data$mca_topcell_sp <- factor(tenx.justwt.integrated@meta.data$mca_topcell_sp, levels = rev(c("M", "F", "5", "7", "4", "3", "1", "0", "2", "6")))

UMAP_mca <- DimPlot(tenx.justwt.integrated, pt.size = 0.1, group.by = "mca_topcell_sp", dims = c(1,2)) +
  coord_fixed() + 
  theme_void() +
  labs(title = paste("MCA Predicted Cell Type"), colour = "MCA Cell Type") + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) +
  scale_color_manual(values = mca_colors, labels = (c("6 - early ring", "2 - late ring", "0 - early troph", "1 - mid troph", "3 - late troph", "4 - early schizont", "7 - mid schizont", "5 - late schizont", "female", "male"))) +
  theme(legend.position = "right", legend.title=element_text(size=15))

UMAP_mca
```

save
```{r}
ggsave("../images_to_export/UMAP_mca.png", plot = UMAP_mca, device = "png", path = NULL, scale = 1, width = 15, height = 15, units = "cm", dpi = 300, limitsize = TRUE)
```

# 5. Clustering {.tabset} 

### Generate clusters

```{r}
## generate new clusters at low resolution
tenx.justwt.integrated <- FindNeighbors(tenx.justwt.integrated, dims = 1:19, reduction = "pca")
tenx.justwt.integrated <- FindClusters(tenx.justwt.integrated, resolution = 2, random.seed = 42, algorithm = 2)
```

### visualise clusters

```{r}
DimPlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE, repel = FALSE, label = TRUE)
```

Make individual plots highlighting where cells in each cluster fall
```{r, echo = FALSE, message=FALSE}
## for loop which takes each cluster and makes a list of cells and then plots a highlighted plot and adds it to a list

## make a blank list
list_UMAPs_by_cluster <- vector(mode = "list", length = length(levels(tenx.justwt.integrated@meta.data$integrated_snn_res.2)))

## for loop
for(i in seq_along(levels(tenx.justwt.integrated@meta.data$integrated_snn_res.2))){
  ## make a list of cells
  list_of_cells <- rownames(tenx.justwt.integrated@meta.data[which(tenx.justwt.integrated@meta.data$integrated_snn_res.2 == levels(tenx.justwt.integrated@meta.data$integrated_snn_res.2)[i]), ])
  umap_plot <- DimPlot(tenx.justwt.integrated, label = FALSE, repel = TRUE, pt.size = 0.1, cells.highlight = list_of_cells, dims = c(1,2), reduction = "umap") +
    ## fix coordinates
    coord_fixed() + 
  scale_color_manual(values=c("#D3D3D3", "#1D1564")) + 
  theme_void() + 
  labs(title = paste("cluster", levels(tenx.justwt.integrated@meta.data$integrated_snn_res.2)[i])) + 
  theme(plot.title = element_text(hjust = 0.5), legend.position = "none")
  ## add to the list
  list_UMAPs_by_cluster[[i]] <- umap_plot
}

## check number of clusters
length(list_UMAPs_by_cluster)
```

plot
```{r, fig.height = 10, fig.width = 10}
## this function writes the next bit of code for you
## put it into the console and paste the response
#ploty <- c()
#for(i in seq_along(levels(tenx.justwt.integrated@meta.data$seurat_clusters))){
#  ploty <- paste0(ploty, "list_UMAPs_by_cluster[[", i, "]]", " + ")
#}

## plot
list_UMAPs_by_cluster[[1]] + list_UMAPs_by_cluster[[2]] + list_UMAPs_by_cluster[[3]] + list_UMAPs_by_cluster[[4]] + list_UMAPs_by_cluster[[5]] + list_UMAPs_by_cluster[[6]] + list_UMAPs_by_cluster[[7]] + list_UMAPs_by_cluster[[8]] + list_UMAPs_by_cluster[[9]] + list_UMAPs_by_cluster[[10]] + list_UMAPs_by_cluster[[11]] + list_UMAPs_by_cluster[[12]] + list_UMAPs_by_cluster[[13]] + list_UMAPs_by_cluster[[14]] + list_UMAPs_by_cluster[[15]] + list_UMAPs_by_cluster[[16]] + list_UMAPs_by_cluster[[17]] + list_UMAPs_by_cluster[[18]] + list_UMAPs_by_cluster[[19]] + list_UMAPs_by_cluster[[20]] + list_UMAPs_by_cluster[[21]] + list_UMAPs_by_cluster[[22]] + list_UMAPs_by_cluster[[23]] + list_UMAPs_by_cluster[[24]]
```

### Find Markers

```{r}
# PBANKA-1418100        GCSKO-17  FD3   
# PBANKA-0102400         GCSKO-2  MD4 
# PBANKA-0716500        GCSKO-19  MD5 
# PBANKA-1435200        GCSKO-20  FD4 
# PBANKA-0902300        GCSKO-13  FD2
# PBANKA-0413400    GCSKO-10_820  MD3
# PBANKA-0828000         GCSKO-3  GD1
# PBANKA-1302700       GCSKO-oom  MD1 
# PBANKA-1447900        GCSKO-29  MD2
# PBANKA-1454800        GCSKO-21  FD1
# PBANKA-1144800        GCSKO-28  FD5

DefaultAssay(tenx.justwt.integrated) <- "RNA"
# find markers for every cluster compared to all remaining cells, report only the positive ones
pb.markers <- FindAllMarkers(tenx.justwt.integrated, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25, test.use = "MAST")
pb.markers %>% group_by(cluster) %>% top_n(n = 5, wt = avg_logFC)
```

```{r}
markers_annotated <- merge(pb.markers, gene_annotations,  by.x = "gene", by.y = "Gene.ID", all = FALSE)
View(markers_annotated)
```

```{r, R.options = list(width = 300)}
markers_subset <- pb.markers %>% group_by(cluster) %>% top_n(n = 20, wt = avg_logFC)
markers_subset_annotated <- merge(markers_subset, gene_annotations,  by.x = "gene", by.y = "Gene.ID", all = FALSE)
markers_subset_annotated
View(markers_subset_annotated)
```

```{r}
View(markers_annotated_mutant <- markers_annotated[which(markers_annotated$gene %in% list_of_mutant_genes), ])
```


### visualise markers

```{r, fig.height = 10, fig.width = 10}
## get the top 10 for each cluster
top10 <- pb.markers %>% group_by(cluster) %>% top_n(n = 10, wt = avg_logFC)

## scale data is missing - but original dfs have scale.data
# df_scaled <- as.matrix(transform(merge(as.data.frame(tenx.justwt.list[[1]]@assays$RNA@scale.data), as.data.frame(tenx.justwt.list[[2]]@assays$RNA@scale.data), by = 0, all=FALSE), row.names=Row.names, Row.names=NULL))
# tenx.justwt.integrated@assays$RNA@scale.data <- df_scaled

## scale data in RNA slot
all.genes <- rownames(tenx.justwt.integrated)
tenx.justwt.integrated <- ScaleData(tenx.justwt.integrated, vars.to.regress = 'experiment', features = all.genes)

## heatmap
DoHeatmap(tenx.justwt.integrated, features = top10$gene) + NoLegend()
```

# 6. Define Cluster Identities {.tabset} 

### Downsampling
```{r}
# ## check number of cells in each cluster
# df_clusters <- as.data.frame(table(tenx.justwt.integrated@meta.data$integrated_snn_res.2))
# plot(df_clusters)
# 
# ## downsample using appropriate metrics
# tenx.justwt.integrated.downsampled <- subset(tenx.justwt.integrated, downsample = 100)
#   
# ## inspect plot
# FeaturePlot(tenx.justwt.integrated.downsampled, features = "PBANKA-1437500", coord.fixed = TRUE, dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
#   theme_void() + 
#   labs(title = paste("AP2G (Commitment)")) + 
#   theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
#   scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))
```

### Marker Genes

useful tools for all plots
```{r}
## define male and female symbol
female_symbol <- intToUtf8(9792)
male_symbol <- intToUtf8(9794)

## define colour pal
marker_gene_ramp <- colorRampPalette(c("#D3D3D3", "#1D1564"))(50)
```

#### Expression - cutoffs - purple

```{r, fig.height = 15, fig.width = 15}

marker_gene_plot_CCP2 <- FeaturePlot(tenx.justwt.integrated, 
                                     features = "PBANKA-1319500", 
                                     coord.fixed = TRUE, 
                                     min.cutoff = "q05", 
                                     max.cutoff = "q95", 
                                     dims = c(1,2), 
                                     reduction = "umap", 
                                     pt.size = 0.5, 
                                     order = TRUE, 
                                     slot = 'scale.data') + 
                          coord_fixed() +
                          theme_void() + 
                          labs(title = expression(italic(ccp2)~(Female))) + 
                          theme(plot.title = element_text(hjust = 0.5), 
                                legend.text = element_text(size = 5), 
                                text=element_text(size=6.0), 
                                plot.margin = unit(c(0, 0, 0, 0), "cm")
                                ) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))

marker_gene_plot_MG1 <- FeaturePlot(tenx.justwt.integrated, 
                                    features = "PBANKA-0416100", 
                                    coord.fixed = TRUE, 
                                    min.cutoff = "q05", 
                                    max.cutoff = "q95", 
                                    dims = c(1,2), 
                                    reduction = "umap", 
                                    pt.size = 0.5, 
                                    order = TRUE, 
                                    slot = 'scale.data') +
                        coord_fixed() + 
                        theme_void() + 
                        labs(title = expression(italic(mg1)~(Male))) + 
                        theme(plot.title = element_text(hjust = 0.5), legend.text = element_text(size = 5), text=element_text(size=6.0), plot.margin = unit(c(0, 0, 0, 0), "cm")) + 
                        scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))

marker_gene_plot_AP2G <- FeaturePlot(tenx.justwt.integrated, 
                                     features = "PBANKA-1437500", 
                                     coord.fixed = TRUE, 
                                     min.cutoff = "q05", 
                                     max.cutoff = "q95", 
                                     dims = c(1,2), 
                                     reduction = "umap", 
                                     pt.size = 0.5, 
                                     order = TRUE, 
                                     slot = 'scale.data') + 
                          coord_fixed() +
                          theme_void() + 
                          labs(title = expression(italic(ap2g)~(Commitment))) + 
                          theme(plot.title = element_text(hjust = 0.5), legend.text = element_text(size = 5), text=element_text(size=6.0), plot.margin = unit(c(0, 0, 0, 0), "cm")) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))

marker_gene_plot_MSP1 <- FeaturePlot(tenx.justwt.integrated, 
                                     features = "PBANKA-0831000", 
                                     coord.fixed = TRUE, 
                                     min.cutoff = "q05", 
                                     max.cutoff = "q95", 
                                     dims = c(1,2), 
                                     reduction = "umap", 
                                     pt.size = 0.5, 
                                     order = TRUE, 
                                     slot = 'scale.data') + 
                          coord_fixed() +
                          theme_void() + 
                          labs(title = expression(italic(msp1)~(Schizont))) + 
                          theme(plot.title = element_text(hjust = 0.5), legend.text = element_text(size = 5), text=element_text(size=6.0), plot.margin = unit(c(0, 0, 0, 0), "cm")) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))

marker_gene_plot_MSP8 <- FeaturePlot(tenx.justwt.integrated, 
                                     features = "PBANKA-1102200", 
                                     coord.fixed = TRUE, 
                                     min.cutoff = "q05", 
                                     max.cutoff = "q95", 
                                     dims = c(1,2), 
                                     reduction = "umap", 
                                     pt.size = 0.5, 
                                     order = TRUE, 
                                     slot = 'scale.data') + 
                          coord_fixed() +
                          theme_void() + 
                          labs(title = expression(italic(msp8)~(Asexual))) + 
                          theme(plot.title = element_text(hjust = 0.5), legend.text = element_text(size = 5), text=element_text(size=6.0), plot.margin = unit(c(0, 0, 0, 0), "cm")) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))

marker_gene_plot_SBP1 <- FeaturePlot(tenx.justwt.integrated, 
                                     features = "PBANKA-1101300", 
                                     coord.fixed = TRUE, 
                                     min.cutoff = "q05", 
                                     max.cutoff = "q95", 
                                     dims = c(1,2), 
                                     reduction = "umap", 
                                     pt.size = 0.5, 
                                     order = TRUE,
                                     slot = 'scale.data') +
                          coord_fixed() + 
                          theme_void() + 
                          labs(title = expression(italic(SBP1)~(Ring))) + 
                          theme(plot.title = element_text(hjust = 0.5), legend.text = element_text(size = 5), text=element_text(size=6.0), plot.margin = unit(c(0, 0, 0, 0), "cm")) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))

marker_gene_plot_FAMB <- FeaturePlot(tenx.justwt.integrated, 
                                     features = "PBANKA-0722600", 
                                     coord.fixed = TRUE, 
                                     min.cutoff = "q05", 
                                     max.cutoff = "q95", 
                                     dims = c(1,2), 
                                     reduction = "umap", 
                                     pt.size = 0.5, 
                                     order = TRUE,
                                     slot = 'scale.data') +
                          coord_fixed() + 
                          theme_void() + 
                          labs(title = expression(italic(fam-b2)~(Ring))) + 
                          theme(plot.title = element_text(hjust = 0.5), legend.text = element_text(size = 5), text=element_text(size=6.0), plot.margin = unit(c(0, 0, 0, 0), "cm")) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))

marker_gene_plot_ORC1 <- FeaturePlot(tenx.justwt.integrated, 
                                     features = "PBANKA-0602000", 
                                     coord.fixed = TRUE, 
                                     min.cutoff = "q05", 
                                     max.cutoff = "q95", 
                                     dims = c(1,2), 
                                     reduction = "umap", 
                                     pt.size = 0.5, 
                                     order = TRUE,
                                     slot = 'scale.data') +
                          coord_fixed() + 
                          theme_void() + 
                          labs(title = expression(italic(orc1)~(DNA~replication))) + 
                          theme(plot.title = element_text(hjust = 0.5), legend.text = element_text(size = 5), text=element_text(size=6.0), plot.margin = unit(c(0, 0, 0, 0), "cm")) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))

marker_gene_plot_CDPK5 <- FeaturePlot(tenx.justwt.integrated, 
                                      features = "PBANKA-1351500", 
                                      coord.fixed = TRUE, 
                                      min.cutoff = "q05", 
                                      max.cutoff = "q95", 
                                      dims = c(1,2), 
                                      reduction = "umap", 
                                      pt.size = 0.5, 
                                      order = TRUE, 
                                      slot = 'scale.data') +
                          coord_fixed() +
                          theme_void() + 
                          labs(title = expression(italic(cdpk5)~(Schizont))) + 
                          theme(plot.title = element_text(hjust = 0.5), legend.text = element_text(size = 5), text=element_text(size=6.0), plot.margin = unit(c(0, 0, 0, 0), "cm")) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))

marker_gene_plot_MCM4 <- FeaturePlot(tenx.justwt.integrated, 
                                      features = "PBANKA-1415600", 
                                      coord.fixed = TRUE, 
                                      min.cutoff = "q05", 
                                      max.cutoff = "q95", 
                                      dims = c(1,2), 
                                      reduction = "umap", 
                                      pt.size = 0.5, 
                                      order = TRUE, 
                                      slot = 'scale.data') +
                          coord_fixed() +
                          theme_void() + 
                          labs(title = expression(italic(mcm4)~(DNA~replication))) + 
                          theme(plot.title = element_text(hjust = 0.5), legend.text = element_text(size = 5), text=element_text(size=6.0), plot.margin = unit(c(0, 0, 0, 0), "cm")) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))

# + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))

## plot
## cowplot method
marker_gene_plot_all <- plot_grid(marker_gene_plot_FAMB, marker_gene_plot_MSP8, marker_gene_plot_MSP1, marker_gene_plot_AP2G, marker_gene_plot_CCP2, marker_gene_plot_MG1, marker_gene_plot_HSP70, nrow=3)

marker_gene_plot_all
```

### Mutant Genes

#### Expression - with cutoffs

```{r, fig.height = 10, fig.width = 10}
# PBANKA-1418100        GCSKO-17  FD3   
# PBANKA-0102400         GCSKO-2  MD4 
# PBANKA-0716500        GCSKO-19  MD5 
# PBANKA-1435200        GCSKO-20  FD4 
# PBANKA-0902300        GCSKO-13  FD2
# PBANKA-0413400    GCSKO-10_820  MD3
# PBANKA-0828000         GCSKO-3  GD1
# PBANKA-1302700       GCSKO-oom  MD1 
# PBANKA-1447900        GCSKO-29  MD2
# PBANKA-1454800        GCSKO-21  FD1
# PBANKA-1144800        GCSKO-28  FD5


marker_gene_plot_fd3 <- FeaturePlot(tenx.justwt.integrated, 
                                   dims = c(1,2), 
                                   reduction = "umap", 
                                   features = "PBANKA-1418100", 
                                   coord.fixed = TRUE,
                                   min.cutoff = "q05", 
                                   max.cutoff = "q95", 
                                   pt.size = 0.5, 
                                   order = TRUE, 
                                   slot = 'scale.data') + 
                        theme_void() + 
                        labs(title = expression(italic(fd3))) + 
                        theme(plot.title = element_text(hjust = 0.5), 
                                legend.text = element_text(size = 5), 
                                text=element_text(size=7), 
                                plot.margin = unit(c(0, 0, 0, 0), "cm")
                                ) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))
                        ## add sex symbols
                        #annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
                        #annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_md4 <- FeaturePlot(tenx.justwt.integrated, 
                                  dims = c(1,2), 
                                  reduction = "umap", 
                                  features = "PBANKA-0102400", 
                                  coord.fixed = TRUE, 
                                  min.cutoff = "q05",
                                  max.cutoff = "q95", 
                                  pt.size = 0.5, 
                                  order = TRUE, 
                                  slot = 'scale.data') + 
                      theme_void() + 
                      labs(title = expression(italic(md4))) + 
                      theme(plot.title = element_text(hjust = 0.5), 
                                legend.text = element_text(size = 5), 
                                text=element_text(size=7), 
                                plot.margin = unit(c(0, 0, 0, 0), "cm")
                                ) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))
                      ## add sex symbols
                      #annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
                      #annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_md5 <- FeaturePlot(tenx.justwt.integrated, 
                                   dims = c(1,2), 
                                   reduction = "umap", 
                                   features = "PBANKA-0716500", 
                                   coord.fixed = TRUE, 
                                   min.cutoff = "q05", 
                                   max.cutoff = "q95",
                                   pt.size = 0.5, 
                                   order = TRUE, 
                                   slot = 'scale.data') + 
                       theme_void() + 
                       labs(title = expression(italic(md5))) + 
                       theme(plot.title = element_text(hjust = 0.5), 
                                legend.text = element_text(size = 5), 
                                text=element_text(size=7), 
                                plot.margin = unit(c(0, 0, 0, 0), "cm")
                                ) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = "")) 
                       ## add sex symbols
                       #annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
                       #annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_fd4 <- FeaturePlot(tenx.justwt.integrated, 
                                   dims = c(1,2), 
                                   reduction = "umap", 
                                   features = "PBANKA-1435200", 
                                   coord.fixed = TRUE, 
                                   min.cutoff = "q05", 
                                   max.cutoff = "q95", 
                                   pt.size = 0.5, 
                                   order = TRUE, 
                                   slot = 'scale.data') + 
                       theme_void() + 
                       labs(title = expression(italic(fd4))) + 
                       theme(plot.title = element_text(hjust = 0.5), 
                                legend.text = element_text(size = 5), 
                                text=element_text(size=7), 
                                plot.margin = unit(c(0, 0, 0, 0), "cm")
                                ) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = "")) 
                       ## add sex symbols
                       #annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
                       #annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_fd2 <- FeaturePlot(tenx.justwt.integrated, 
                                    dims = c(1,2), 
                                    reduction = "umap", 
                                    features = "PBANKA-0902300", 
                                    coord.fixed = TRUE, 
                                    min.cutoff = "q05", 
                                    max.cutoff = "q95", 
                                    pt.size = 0.5, 
                                    order = TRUE, 
                                    slot = 'scale.data') + 
                       theme_void() + 
                       labs(title = expression(italic(fd2))) + 
                       theme(plot.title = element_text(hjust = 0.5), 
                                legend.text = element_text(size = 5), 
                                text=element_text(size=7), 
                                plot.margin = unit(c(0, 0, 0, 0), "cm")
                                ) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))
                       ## add sex symbols
                       # + annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
                       #annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_md3 <- FeaturePlot(tenx.justwt.integrated, 
                                   dims = c(1,2), 
                                   reduction = "umap", 
                                   features = "PBANKA-0413400", 
                                   coord.fixed = TRUE, 
                                   min.cutoff = "q05", 
                                   max.cutoff = "q95",
                                   pt.size = 0.5, 
                                   order = TRUE, 
                                   slot = 'scale.data') + 
                       theme_void() + 
                       labs(title = expression(italic(md3))) + 
                       theme(plot.title = element_text(hjust = 0.5), 
                                legend.text = element_text(size = 5), 
                                text=element_text(size=7), 
                                plot.margin = unit(c(0, 0, 0, 0), "cm")
                                ) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))
                       ## add sex symbols
                      #annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
                       #annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_gd1 <- FeaturePlot(tenx.justwt.integrated, 
                                  dims = c(1,2), 
                                  reduction = "umap",
                                  features = "PBANKA-0828000", 
                                  coord.fixed = TRUE, 
                                  min.cutoff = "q05", 
                                  max.cutoff = "q95",
                                  pt.size = 0.5, 
                                  order = TRUE, 
                                  slot = 'scale.data') + 
                       theme_void() + 
                       labs(title = expression(italic(gd1))) + 
                       theme(plot.title = element_text(hjust = 0.5), 
                                legend.text = element_text(size = 5), 
                                text=element_text(size=7), 
                                plot.margin = unit(c(0, 0, 0, 0), "cm")
                                ) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))
                       ## add sex symbols
                       #annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
                       #annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_md1 <- FeaturePlot(tenx.justwt.integrated, 
                                    dims = c(1,2), 
                                    reduction = "umap", 
                                    features = "PBANKA-1302700",
                                    coord.fixed = TRUE,
                                    min.cutoff = "q05", 
                                    max.cutoff = "q95", 
                                    pt.size = 0.5, 
                                    order = TRUE, 
                                    slot = 'scale.data') + 
                       theme_void() + 
                       labs(title = expression(italic(md1))) + 
                       theme(plot.title = element_text(hjust = 0.5), 
                                legend.text = element_text(size = 5), 
                                text=element_text(size=7), 
                                plot.margin = unit(c(0, 0, 0, 0), "cm")
                                ) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))
                       ## add sex symbols
                       #annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
                       #annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_md2 <- FeaturePlot(tenx.justwt.integrated,
                                   dims = c(1,2), 
                                   reduction = "umap",
                                   features = "PBANKA-1447900",
                                   coord.fixed = TRUE,
                                   min.cutoff = "q05", 
                                   max.cutoff = "q95", 
                                   pt.size = 0.5, 
                                   order = TRUE, 
                                   slot = 'scale.data') + 
                       theme_void() + 
                       labs(title = expression(italic(md2))) + 
                       theme(plot.title = element_text(hjust = 0.5), 
                                legend.text = element_text(size = 5), 
                                text=element_text(size=7), 
                                plot.margin = unit(c(0, 0, 0, 0), "cm")
                                ) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))
                       ## add sex symbols
                       #annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
                       #annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_fd1 <- FeaturePlot(tenx.justwt.integrated, 
                                    dims = c(1,2), 
                                    reduction = "umap", 
                                    features = "PBANKA-1454800", 
                                    coord.fixed = TRUE, 
                                    min.cutoff = "q05",
                                    max.cutoff = "q95", 
                                    pt.size = 0.5, 
                                    order = TRUE, 
                                    slot = 'scale.data') + 
                       theme_void() + 
                       labs(title = expression(italic(fd1))) + 
                       theme(plot.title = element_text(hjust = 0.5), 
                                legend.text = element_text(size = 5), 
                                text=element_text(size=7), 
                                plot.margin = unit(c(0, 0, 0, 0), "cm")
                                ) + 
                          scale_color_continuous_sequential(palette = "Purples 2", labels = number_format(accuracy = 0.1)) +
                          guides(colour = guide_colourbar(barwidth = 0.3, barheight = 3.0, title = ""))
                       ## add sex symbols
                       #annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
                       #annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

##original label:
# labs(title = paste("(CCP2; Female)","\n", "PBANKA_1319500"))
# scale_color_continuous_sequential(palette = "Purples 2")
# scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))

## make composite plot
mutant_expression_composite <- wrap_plots(marker_gene_plot_gd1 , marker_gene_plot_md1 , marker_gene_plot_md2 , marker_gene_plot_md3 , marker_gene_plot_md4 , marker_gene_plot_md5 , marker_gene_plot_fd1 , marker_gene_plot_fd2 , marker_gene_plot_fd3 , marker_gene_plot_fd4, ncol = 4)
           
## print
mutant_expression_composite
```

save
```{r}
ggsave("../images_to_export/ALL_CELLS_wt_gene_expression.png", plot = mutant_expression_composite, device = "png", path = NULL, scale = 1, width = 21, height = 29.7, units = "cm", dpi = 300, limitsize = TRUE)
```

```{r, fig.height = 10, fig.width = 10}
# PBANKA-1418100        GCSKO-17  FD3   
# PBANKA-0102400         GCSKO-2  MD3 
# PBANKA-0716500        GCSKO-19  MD4 
# PBANKA-1435200        GCSKO-20  FD4 
# PBANKA-0902300        GCSKO-13  FD2
# PBANKA-0413400    GCSKO-10_820  MD5
# PBANKA-0828000         GCSKO-3  GD1
# PBANKA-1302700       GCSKO-oom  MD1 
# PBANKA-1447900        GCSKO-29  MD2
# PBANKA-1454800        GCSKO-21  FD1
# PBANKA-1144800        GCSKO-28  FD5


marker_gene_plot_17 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-1418100", coord.fixed = TRUE, min.cutoff = "q10", max.cutoff = "q90", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("fd3")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_2 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-0102400", coord.fixed = TRUE, min.cutoff = "q10", max.cutoff = "q90", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("md3")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_19 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-0716500", coord.fixed = TRUE, min.cutoff = "q10", max.cutoff = "q90", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("md4")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_20 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-1435200", coord.fixed = TRUE, min.cutoff = "q10", max.cutoff = "q90", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("fd4")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_13 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-0902300", coord.fixed = TRUE, min.cutoff = "q10", max.cutoff = "q90", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("fd2")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_10 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-0413400", coord.fixed = TRUE, min.cutoff = "q10", max.cutoff = "q90", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("md5")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_3 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-0828000", coord.fixed = TRUE, min.cutoff = "q10", max.cutoff = "q90", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("gd1")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_oom <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-1302700", coord.fixed = TRUE, min.cutoff = "q10", max.cutoff = "q90", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("md1")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_29 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-1447900", coord.fixed = TRUE, min.cutoff = "q10", max.cutoff = "q90", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("md2")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_21 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-1454800", coord.fixed = TRUE, min.cutoff = "q10", max.cutoff = "q90", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("fd1")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

##original label:
# labs(title = paste("(CCP2; Female)","\n", "PBANKA_1319500"))

## make composite plot
mutant_expression_composite <- wrap_plots(marker_gene_plot_17 , marker_gene_plot_2 , marker_gene_plot_19 , marker_gene_plot_20 , marker_gene_plot_13 , marker_gene_plot_10 , marker_gene_plot_3 , marker_gene_plot_oom , marker_gene_plot_29 , marker_gene_plot_21 , ncol = 4)
           
## print
mutant_expression_composite
```



# 7. Pseudotime on all cells {.tabset}

### Pseudotime calculation

```{r}
## extract data from Seurat
seurat.object.all <- tenx.justwt.integrated
# counts
data <- as(as.matrix(GetAssayData(seurat.object.all, assay = "integrated", slot = "data")), 'sparseMatrix')
# meta data
pd <- data.frame(seurat.object.all@meta.data)

## keep only the columns that are relevant
#pData <- pd %>% select(orig.ident, nCount_RNA, nFeature_RNA)
## add gene short name
fData <- data.frame(gene_short_name = row.names(data), row.names = row.names(data))

## Construct monocle cds
monocle.object.all <- new_cell_data_set(expression_data = data, cell_metadata = pd, gene_metadata = fData)
## preprocess
monocle.object.all = preprocess_cds(monocle.object.all, num_dim = 100, norm_method = "none")
## plot variance explained plot
#plot_pc_variance_explained(monocle.object.all)
## make monocle UMAP
#monocle.object.all = reduce_dimension(monocle.object.all, reduction_method = "UMAP", preprocess_method = "PCA", umap.metric = "euclidean", umap.n_neighbors = 20, umap.min_dist = 0.5, verbose = FALSE)
#plot_cells(monocle.object.all)

## add UMAP from Seurat
monocle.object.all@int_colData@listData$reducedDims@listData[["UMAP"]] <-seurat.object.all@reductions[["umap"]]@cell.embeddings 
plot_cells(monocle.object.all)

## cluster
monocle.object.all = cluster_cells(monocle.object.all)

## plot clusters
plot_cells(monocle.object.all, color_cells_by="partition", group_cells_by="partition",  x = 1, y = 2)

## reduce partitions to 1
monocle.object.all@clusters$UMAP$partitions[monocle.object.all@clusters$UMAP$partitions == "2"] <- "1"

#map pseudotime
monocle.object.all = learn_graph(monocle.object.all, learn_graph_control=list(ncenter=550, minimal_branch_len = 15), use_partition = FALSE)

plot_cells(monocle.object.all, color_cells_by="partition", group_cells_by="partition",  x = 1, y = 2)

## a helper function to identify the root principal points:
## make cluster 2 the root
# get_earliest_principal_node <- function(cds, time_bin="7"){
#   cell_ids <- which(colData(cds)[, "seurat_clusters"] == time_bin)
#   closest_vertex <-
#   cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex
#   closest_vertex <- as.matrix(closest_vertex[colnames(cds), ])
#   root_pr_nodes <-
#   igraph::V(principal_graph(cds)[["UMAP"]])$name[as.numeric(names
#   (which.max(table(closest_vertex[cell_ids,]))))]
#   
#   root_pr_nodes
# }

## calculate pseudotime
#monocle.object.all = order_cells(monocle.object.all, root_pr_nodes=get_earliest_principal_node(monocle.object.all))
monocle.object.all = order_cells(monocle.object.all)
## used 5 points at the beginning


## plot
umap_pt <- plot_cells(monocle.object.all, color_cells_by = "pseudotime", label_cell_groups=FALSE, cell_size = 1, x = 1, y = 2, label_branch_points=FALSE, label_leaves=FALSE, label_groups_by_cluster=FALSE, label_roots = FALSE) +
  coord_fixed() +
  theme_void() +
  labs(title = "") +
  theme(plot.title = element_text(hjust = 0.5, size=20), legend.position="bottom", legend.title=element_text (size=20), legend.text=element_text(size=20)) + 
  guides(colour = guide_colourbar(barwidth = 10, barheight = 2, title = "Pseudotime"))

## view plot
umap_pt

## help was obtained from here
## https://github.com/satijalab/seurat/issues/1658
```

save
```{r}
ggsave("../images_to_export/UMAP_pt_wt.png", plot = umap_pt, device = "png", path = NULL, scale = 1, width = 20, height = 20, units = "cm", dpi = 300, limitsize = TRUE)
```

### gganimnate GIF of pseuodtime

Select appropriate cells
```{r}
plot <- DimPlot(tenx.justwt.integrated, label = FALSE, repel = TRUE, pt.size = 0.1, dims = c(1,2), reduction = "umap")
HoverLocator(plot, information = FetchData(object = tenx.justwt.integrated, vars = 'monocle_sex'))
```

```{r}
cells_to_remove <- c("SC26779_8_280", "CTGGTCTAGCAGATCG")
```

```{r}
#install.packages("gganimate")
library(gganimate)
#install.packages("gifski")
#install.packages("av")
#library(gifski)
#library(av)

## make dataframe for plotting
## extract data for GGplot version of this
df_animation <- as.data.frame(monocle.object.all@int_colData@listData$reducedDims@listData[["UMAP"]])
## add pt to this data frame:
pt_values <- as.data.frame(pseudotime(monocle.object.all, reduction_method = "UMAP"))
df_animation <- merge(df_animation, pt_values, by="row.names") 
rownames(df_animation) <- df_animation$Row.names
colnames(df_animation)[4] <- "pt"

## make the static plot
p <- ggplot(df_animation, aes(x = UMAP_1, y = UMAP_2, colour = pt)) +
  geom_point() +
  scale_colour_viridis_c(option = "plasma") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "none")
## view plot
plot(p)

## make animated plot
## make a category for animation
#df_animation$group <- cut(df_animation$pt, 15)

anim <- p +
  transition_time(pt) +
  shadow_mark()

animate(anim, height = 3, width = 3, units = "in", res = 150, bg = 'transparent')

## to change the resolution - https://stackoverflow.com/questions/49058567/define-size-for-gif-created-by-gganimate-change-dimension-resolution 
```
Save animation
```{r}
anim_save("animated_UMAP_transparent_bg_wt.gif", path = "../images_to_export/")
```

```{r}
## extract pt values
pt_values <- as.data.frame(pseudotime(monocle.object.all, reduction_method = "UMAP"))

tenx.justwt.integrated <- AddMetaData(tenx.justwt.integrated, pt_values, "old_pt_values")
```

```{r}
library(gganimate)

## make dataframe for plotting
## extract data for GGplot version of this
df_animation <- as.data.frame(tenx.justwt.integrated@reductions[["umap"]]@cell.embeddings)
## add pt to this data frame:
pt_values <- as.data.frame(tenx.justwt.integrated@meta.data[ ,c("old_pt_values", "pt_id_cols"), drop = FALSE])
df_animation <- merge(df_animation, pt_values, by="row.names") 
rownames(df_animation) <- df_animation$Row.names
colnames(df_animation)[4] <- "pt"
## remove outlier cells
df_animation <- df_animation[which(df_animation$Row.names %ni% cells_to_remove), ]

#df_animation <- df_animation[order(df_animation$pt, decreasing = TRUE), ]

## make a category for animation
#df_animation$group <- as.numeric(cut(df_animation$pt, 10, include.lowest = TRUE, labels = c(1:10)))

## make the static plot
p <- ggplot(df_animation, aes(x = UMAP_1, y = UMAP_2)) +
  geom_point(col = df_animation$pt_id_cols) +
  #scale_colour_viridis_c(option = "plasma") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "none") +
  view_static()
## view plot
plot(p)

## make animated plot


anim <- p +
  transition_time(pt) +
  shadow_mark() +
  #shadow_wake(wake_length = 0.99) #+
  enter_fade()

animate(anim, 
        height = 3, 
        width = 3, 
        units = "in", 
        res = 300, 
        bg = 'white', 
        #duration = 6, 
        fps = 20, 
        device = "png",
        #ref_frame = 1
        end_pause = 10
        )

## save
anim_save("animated_UMAP_highres.gif", path = "../images_to_export/")

## to change the resolution - https://stackoverflow.com/questions/49058567/define-size-for-gif-created-by-gganimate-change-dimension-resolution 
```


#### Define cell identities with branches

Define identities of cells 

male
```{r}
monocle.object_male <- choose_graph_segments(monocle.object.all)
```
female
```{r}
monocle.object_female <- choose_graph_segments(monocle.object.all)
```
bipotential
```{r}
monocle.object_bipot <- choose_graph_segments(monocle.object.all)
```
asexual (pre-branch)
```{r}
monocle.object_asex_pre <- choose_graph_segments(monocle.object.all)
```
asexual fate
```{r}
monocle.object_asex_fate <- choose_graph_segments(monocle.object.all)
```
check
```{r}
df_freq <- data.frame(table(c(colnames(monocle.object_male), colnames(monocle.object_female), colnames(monocle.object_bipot), colnames(monocle.object_asex_pre), colnames(monocle.object_asex_fate))))
paste("number of cells in seurat object is", length(colnames(monocle.object.all)), ". The number of cells selected here with an identitity is", dim(df_freq)[1])
df_freq <- df_freq[df_freq$Freq > 1, ]
df_freq
```
Inspect where these missing cells are:
```{r}
# '%ni%' <- Negate('%in%')
# 
# not_assigned_cells <- colnames(monocle.object.all)[colnames(monocle.object.all) %ni% c(colnames(monocle.object_male), colnames(monocle.object_female), colnames(monocle.object_bipot), colnames(monocle.object_asex), colnames(monocle.object_asex_fate))]
# 
# DimPlot(seurat.object, repel = TRUE, label.size = 5, pt.size = 0.5, cells.highlight = not_assigned_cells, dims = c(2,1), reduction = "DIM_UMAP") +
#   coord_fixed() + 
#   scale_color_manual(values=c("#000000", "#f54e1e"))
```

```{r}
## create annotation dataframe from these results:
df_monocle_sexes <- rbind(data.frame("cell_name" = colnames(monocle.object_male), "sex" = rep("Male", length(colnames(monocle.object_male)))),
                          data.frame("cell_name" = colnames(monocle.object_female), "sex" = rep("Female", length(colnames(monocle.object_female)))),
                          data.frame("cell_name" = colnames(monocle.object_bipot), "sex" = rep("Bipotential", length(colnames(monocle.object_bipot)))),
                          data.frame("cell_name" = colnames(monocle.object_asex_pre), "sex" = rep("Asexual_Early", length(colnames(monocle.object_asex_pre)))),
                          data.frame("cell_name" = colnames(monocle.object_asex_fate), "sex" = rep("Asexual_Late", length(colnames(monocle.object_asex_fate))))
                          #data.frame("cell_name" = not_assigned_cells, "sex" = rep("Unassigned", length(not_assigned_cells)))
                          )

dim(df_monocle_sexes)

## order like the metadata
df_monocle_sexes <- df_monocle_sexes[match(rownames(monocle.object.all@colData), df_monocle_sexes$cell_name), ]

## add this back into the monocle object
monocle.object.all@colData$Sexes_monocle <- df_monocle_sexes$sex

## add this to the seurat object
rownames(df_monocle_sexes) <- df_monocle_sexes$cell_name
df_monocle_sexes_to_add_to_seurat <- df_monocle_sexes[,c("sex"), drop = FALSE]
tenx.justwt.integrated <- AddMetaData(tenx.justwt.integrated, df_monocle_sexes_to_add_to_seurat, col.name = "monocle_sex")
```

pt vs. real time
```{r}
library(ggridges)

df_plotting <- as.data.frame(tenx.justwt.integrated@meta.data)

pt_real_plot <- ggplot(df_plotting, aes(x = old_pt_values, y =  as.factor(Prediction.Spearman.), fill = as.factor(Prediction.Spearman.))) +
  geom_density_ridges(alpha = .1,
                      rel_min_height = 0.03,  
                               quantile_lines = TRUE, 
                               quantiles = 2,
                      scale = 2) + # plot rigdelines without points
  # plot points with invisible ridgelines
  geom_density_ridges_gradient(point_alpha = 1, 
                               alpha = 0,
                               colour = NA,
                               na.rm = FALSE, 
                               jittered_points = TRUE,
                               aes(point_color = monocle_sex)
                               )  +
  theme_classic() +
  scale_y_discrete(drop=FALSE) +
  labs(title = '', x = "Pseudotime", y = "Predicted Real-time (hpi)") +
  #scale_fill_viridis_c(name = "pseudotime", option = "C", alpha = 1) +
  theme(legend.position = "bottom",
        panel.grid.major.y = element_line(size = (0.2), colour="grey")
        )

pt_real_plot
```

```{r}
gg <- ggplot(df_plotting, aes(as.factor(Prediction.Spearman.), fill = monocle_sex)) + geom_bar(stat = "count", position="fill")

gg
```


# 8. Plots {.tabset}

make composite pseudotime/ID figure

```{r}
# 1 = blue - "#0052c5"
# 2 = red - "#a52b1e"
# 3 = green - "#016c00"
# 4 = yellow - "#ffe400"
#pal_sex <- c("#0052c5","#ffe400", "#a52b1e", "#016c00")

## extract pseudotime numbers and identity of cells to a dataframe
df_pt_id <- tenx.justwt.integrated@meta.data[,c("old_pt_values", "monocle_sex")]

## inspect possible values
list_of_sexes <- names(table(df_pt_id$monocle_sex))

## make a new column
df_pt_id$colour <- NA

## make colour ramps
asex_ramp <- colorRampPalette(c("#D5E3F5", "#0052c5"))
male_ramp <- colorRampPalette(c("white", "yellow", "#016c00"))
female_ramp <- colorRampPalette(c("yellow", "#a52b1e"))
bipot_ramp <- colorRampPalette(c("white", "#ffe400"))

## re-classify the cells that are unassigned cells removed from sexual branch above:
#df_pt_id[which(rownames(df_pt_id) %in% remove_cells), ]$monocle_sex <- "Asexual"

## assign values to each cluster
## help here: https://stackoverflow.com/questions/9946630/colour-points-in-a-plot-differently-depending-on-a-vector-of-values 
df_pt_id[df_pt_id$monocle_sex == "Asexual_Early" | df_pt_id$monocle_sex == "Asexual_Late", ]$colour <- asex_ramp(100)[as.numeric(cut(df_pt_id[df_pt_id$monocle_sex == "Asexual_Early" | df_pt_id$monocle_sex == "Asexual_Late", ]$old_pt_values,breaks = 100))]

df_pt_id[df_pt_id$monocle_sex == "Male", ]$colour <- male_ramp(100)[as.numeric(cut(df_pt_id[df_pt_id$monocle_sex == "Male", ]$old_pt_values,breaks = 100))]

df_pt_id[df_pt_id$monocle_sex == "Female", ]$colour <- female_ramp(100)[as.numeric(cut(df_pt_id[df_pt_id$monocle_sex == "Female", ]$old_pt_values,breaks = 100))]

df_pt_id[df_pt_id$monocle_sex == "Bipotential", ]$colour <- bipot_ramp(100)[as.numeric(cut(df_pt_id[df_pt_id$monocle_sex == "Bipotential", ]$old_pt_values,breaks = 100))]

## check everything has a value
#table(is.na(df_pt_id$colour))

## make into a df
#df_pt_id <- df_pt_id[ ,"colour", drop = FALSE]

## add back to seurat object
df <- df_pt_id[ ,"colour", drop = FALSE]
tenx.justwt.integrated <- AddMetaData(tenx.justwt.integrated, df, "pt_id_cols")
rm(df)

## plot
## extract UMAP coords
df_umap_plot <- tenx.justwt.integrated@reductions[["umap"]]@cell.embeddings
df_umap_plot <- merge(df_umap_plot, df_pt_id, by=0, all=TRUE)

## add tree
##The tree for monocle is located here:
# monocle.object@principal_graph_aux[["UMAP"]]$dp_mst 
ica_space_df <- t(monocle.object.all@principal_graph_aux[["UMAP"]]$dp_mst) %>%
      as.data.frame() %>%
      dplyr::select_(prin_graph_dim_1 = "UMAP_1", prin_graph_dim_2 = "UMAP_2") %>%
      dplyr::mutate(sample_name = rownames(.),
                    sample_state = rownames(.))

dp_mst <- monocle.object.all@principal_graph[["UMAP"]]

edge_df <- dp_mst %>%
      igraph::as_data_frame() %>%
      dplyr::select_(source = "from", target = "to") %>%
      dplyr::left_join(ica_space_df %>%
                         dplyr::select_(
                           source="sample_name",
                           source_prin_graph_dim_1="prin_graph_dim_1",
                           source_prin_graph_dim_2="prin_graph_dim_2"),
                       by = "source") %>%
      dplyr::left_join(ica_space_df %>%
                         dplyr::select_(
                           target="sample_name",
                           target_prin_graph_dim_1="prin_graph_dim_1",
                           target_prin_graph_dim_2="prin_graph_dim_2"),
                       by = "target")

## make ggplot
umap_id_pt <- ggplot(df_umap_plot, aes(x = UMAP_1, y = UMAP_2)) + 
                     geom_point(col = df_umap_plot$colour) +
                     theme_void() +
                     coord_fixed() +
                     geom_segment(aes_string(x="source_prin_graph_dim_1",
                                     y="source_prin_graph_dim_2",
                                     xend="target_prin_graph_dim_1",
                                     yend="target_prin_graph_dim_2"),
                          data=edge_df)

umap_id_pt
```

save
```{r}
#ggsave("../images_to_export/umap_id_pt.png", plot = umap_id_pt, device = "png", path = NULL, scale = 1, width = 21, height = 29.7, units = "cm", dpi = 300, limitsize = TRUE)
```
cluster plot
```{r}
### Prepare data

## extract df with cluster ID, 
df_pt_id <- tenx.justwt.integrated@meta.data[ ,"seurat_clusters", drop=FALSE]
## extract UMAP coords
df_plot <- tenx.justwt.integrated@reductions[["umap"]]@cell.embeddings
df_plot <- merge(df_plot, df_pt_id, by=0, all=TRUE)
## add colour col
df_plot$colour <- df_plot$seurat_clusters
levels(df_plot$colour) <- list(print(asex_ramp(17)[1]) = "9", asex_ramp(17)[2]="4", 
                               asex_ramp(17)[3]="15", 
                                asex_ramp(17)[4] = "8", 
                                asex_ramp(17)[5]= "1",
                                asex_ramp(17)[6]=  "14", 
                                asex_ramp(17)[7] = "2", 
                                asex_ramp(17)[8] = "10",
                                asex_ramp(17)[9] = "3",
                                asex_ramp(17)[10] = "0",
                                asex_ramp(17)[11] = "6",
                                asex_ramp(17)[12] = "5",
                                asex_ramp(17)[13] = "7",
                                asex_ramp(17)[14] = "12",
                                asex_ramp(17)[15] = "18",
                                asex_ramp(17)[16] = "20",
                                asex_ramp(17)[17] = "23",
                                bipot_ramp(3)[2] = "11",
                                male_ramp(3)[2] = "16",
                                male_ramp(3)[3] = "13",
                                female_ramp(4)[2] = "21",
                                female_ramp(4)[3] = "22",
                                female_ramp(4)[4] = "19",
                                female_ramp(4)[4] = "17"
                                      )
## aabbreviate clusters
df_plot$cluster_names <- df_plot$seurat_clusters
levels(df_plot$cluster_names) <- list(A_1="9", 
                                      A_2="4", 
                                      A_3="15", 
                                      A_4 = "8", 
                                      A_5 = "1",
                                      A_6=  "14", 
                                      A_7 = "2", 
                                      A_8 = "10",
                                      A_9 = "3",
                                      A_10 = "0",
                                      A_11 = "6",
                                      A_12 = "5",
                                      A_13 = "7",
                                      A_14 = "12",
                                      A_15 = "18",
                                      A_16 = "20",
                                      A_17 = "23",
                                      B = "11",
                                      M_1 = "16",
                                      M_2 = "13",
                                      F_1 = "21",
                                      F_2 = "22",
                                      F_3 = "19",
                                      F_3 = "17"
                                      )

library(dplyr)
dat%>%
group_by(custid)%>% 
summarise(Mean=mean(value), Max=max(value), Min=min(value), Median=median(value), Std=sd(value))



## plot
umap_id <- ggplot(df_plot, aes(x = UMAP_1, y = UMAP_2)) + 
                     geom_point(col = df_plot$colour) +
                     theme_void() +
                     coord_fixed()

umap_id
```

```{r}
## these get defined later on, but are replicated above here for plotting
asexual_early_clusters <- c(9, 4, 15, 8, 1, 14, 2, 10, 3, 0, 6, 5)
asexual_late_clusters <- c(7, 12, 18, 20, 23)
bipotentional_early_clusters <- # 0?
bipotential_clusters <- c(11) 
male_clusters <- c(16, 13)
female_clusters <- c(21, 22, 17, 19)

## make a new column for plotting
tenx.justwt.integrated@meta.data$seurat_clusters_dot_plotting <- tenx.justwt.integrated@meta.data$seurat_clusters

## reorder the levels so you can plot the cluters as you wish
my_levels <- c(asexual_early_clusters,asexual_late_clusters, bipotential_clusters, male_clusters, female_clusters)
tenx.justwt.integrated@meta.data$seurat_clusters_dot_plotting <- factor(x = tenx.justwt.integrated@meta.data$seurat_clusters_dot_plotting, levels = my_levels)

## change the levels
levels(tenx.justwt.integrated@meta.data$seurat_clusters_dot_plotting) <- list(A_1="9", 
                                      A_2="4", 
                                      A_3="15", 
                                      A_4 = "8", 
                                      A_5 = "1",
                                      A_6=  "14", 
                                      A_7 = "2", 
                                      A_8 = "10",
                                      A_9 = "3",
                                      A_10 = "0",
                                      A_11 = "6",
                                      A_12 = "5",
                                      A_13 = "7",
                                      A_14 = "12",
                                      A_15 = "18",
                                      A_16 = "20",
                                      A_17 = "23",
                                      P = "11",
                                      M_1 = "16",
                                      M_2 = "13",
                                      F_1 = "21",
                                      F_2 = "22",
                                      F_3 = "19",
                                      F_3 = "17"
                                      )

### Annotation set-up

## extract pseudotime numbers and identity of cells to a dataframe
df_pt_id <- tenx.justwt.integrated@meta.data[,c("old_pt_values", "monocle_sex", "seurat_clusters_dot_plotting", "Prediction.Spearman.", "Prediction.Spearman._Kasia")]

## make a new column
df_pt_id$colour <- NA

## assign bins to each of the values
## help here: https://stackoverflow.com/questions/9946630/colour-points-in-a-plot-differently-depending-on-a-vector-of-values 
df_pt_id[df_pt_id$monocle_sex == "Asexual_Early" | df_pt_id$monocle_sex == "Asexual_Late", ]$colour <- as.numeric(cut(df_pt_id[df_pt_id$monocle_sex == "Asexual_Early" | df_pt_id$monocle_sex == "Asexual_Late", ]$old_pt_values,breaks = 100))

df_pt_id[df_pt_id$monocle_sex == "Male", ]$colour <- as.numeric(cut(df_pt_id[df_pt_id$monocle_sex == "Male", ]$old_pt_values,breaks = 100))

df_pt_id[df_pt_id$monocle_sex == "Female", ]$colour <- as.numeric(cut(df_pt_id[df_pt_id$monocle_sex == "Female", ]$old_pt_values,breaks = 100))

df_pt_id[df_pt_id$monocle_sex == "Bipotential", ]$colour <- as.numeric(cut(df_pt_id[df_pt_id$monocle_sex == "Bipotential", ]$old_pt_values,breaks = 100))

## make colour ramps
asex_ramp <- colorRampPalette(c("#D5E3F5", "#0052c5"))
male_ramp <- colorRampPalette(c("white", "yellow", "#016c00"))
female_ramp <- colorRampPalette(c("yellow", "#a52b1e"))
bipot_ramp <- colorRampPalette(c("white", "#ffe400"))

## assign values to each cluster
## take the mean of the bin
df_annotation <- aggregate(df_pt_id[, "colour"], list(df_pt_id$seurat_clusters_dot_plotting), mean)
## BECAUSE we have ordered the clusters already, we can simply take the row index for this bit
df_annotation$colour <- NA
df_annotation[1:17, ]$colour <- asex_ramp(100)[df_annotation[1:17, ]$x]
df_annotation[18, ]$colour <- bipot_ramp(100)[df_annotation[18, ]$x]
df_annotation[19:20, ]$colour <- male_ramp(100)[df_annotation[19:20, ]$x]
df_annotation[21:23, ]$colour <- female_ramp(100)[df_annotation[21:23, ]$x]

## make colour pallete
#pal_plot <- c(asex_ramp(170)[1,11,21,31,], bipot_ramp(3)[2], male_ramp(5)[3:4], female_ramp(5)[2:5])
pal_plot <- c(df_annotation$colour, '#49C16DFF')

## plot
umap_with_clusters <- DimPlot(tenx.justwt.integrated,
        group.by = "seurat_clusters_dot_plotting",
        dims = c(1,2), 
        reduction = "umap", 
        pt.size = 1,
        label.box = TRUE,
        label.size = 8,
        label.color	= c(rep(c("#000000"), 12), rep(c("#ffffff"), 5), rep(c("#000000"), 2),"#ffffff", "#000000", "#ffffff", "#ffffff"),
        order = TRUE, 
        repel = TRUE, 
        label = TRUE,
        cols = pal_plot) +
  scale_color_manual(labels = c("Asexual 1", "Asexual 2" , "Asexual 3", "Asexual 4", "Asexual 5", "Asexual 6", "Asexual 7", "Asexual 8",  "Asexual 9", "Asexual 10", "Asexual 11", "Asexual 12", "Asexual 13", "Asexual 14", "Asexual 15", "Asexual 16", "Asexual 17", "Progenitor", "Male 1", "Male 2", "Female 1", "Female 2", "Female 3"), values = pal_plot) +
  theme_void() +
                     coord_fixed()

umap_with_clusters
```

save
```{r}
ggsave("../images_to_export/umap_with_clusters.png", plot = umap_with_clusters, device = "png", path = NULL, scale = 1, width = 21, height = 29.5, units = "cm", dpi = 300, limitsize = TRUE)
```

```{r}
## extract proportion of cells
library(plyr)
df_prop_comb <- as.data.frame(table(tenx.justwt.integrated@meta.data$seurat_clusters_dot_plotting, tenx.justwt.integrated@meta.data$experiment))
names(df_prop_comb) <- c("cluster", "experiment", "Freq")

## calcualte the percentage
df_prop_comb$pc <- NA
df_prop_comb[df_prop_comb$experiment == "mutants", ]$pc <- (df_prop_comb[df_prop_comb$experiment == "mutants", ]$Freq/sum(df_prop_comb[df_prop_comb$experiment == "mutants", ]$Freq))*100
df_prop_comb[df_prop_comb$experiment == "tenx_5k", ]$pc <- (df_prop_comb[df_prop_comb$experiment == "tenx_5k", ]$Freq/sum(df_prop_comb[df_prop_comb$experiment == "tenx_5k", ]$Freq))*100

## reorder levels in Var1 
#df_prop_comb$cluster <- factor(df_prop_comb$cluster, levels = c("unassigned", "M", "F", "5", "7", "4", "3", "1", "0", "2", "6"))

## then reorder by this so the cumsum will work below
df_prop_comb <- df_prop_comb[rev(order(df_prop_comb$experiment, df_prop_comb$cluster)),]

## Calculate the cumulative sum of len for each dose
df_cumsum <- ddply(df_prop_comb, "experiment", transform, label_ypos=cumsum(pc) - 0.5*pc)
head(df_cumsum)
# http://www.sthda.com/english/wiki/ggplot2-barplots-quick-start-guide-r-software-and-data-visualization

## reverse the levels in Var1
#df_cumsum$cluster <- factor(df_cumsum$cluster, levels = rev(c(levels(df_cumsum$cluster))))
#df_cumsum <- df_cumsum[c(match(df_cumsum$cluster[1:23], c(levels(df_cumsum$cluster))), match(df_cumsum$cluster[1:23], c(levels(df_cumsum$cluster))) + 23), ]

library(ggrepel)
library(ggpubr)
## make plot
plot_prop <- ggplot(data=df_cumsum, aes(x=experiment, y=pc, fill=cluster)) +
  geom_bar(stat="identity")+
  geom_label_repel(aes(label = Freq, y=label_ypos, fill = factor(cluster)), color = c(rep(c(rep(c("#ffffff"), 2), "#000000", "#ffffff", rep(c("#000000"), 2), rep(c("#ffffff"), 5), rep(c("#000000"), 12)), 2)), size = 3.5, show.legend = FALSE) +
  #geom_text(aes(y=label_ypos, label=Freq), vjust=0, color="black", size=3.5) +
  labs(fill = "Cell cluster", y= "Cell Proportions (%)", x = "Technology") +
  scale_fill_manual(labels = c("Asexual 1", "Asexual 2" , "Asexual 3", "Asexual 4", "Asexual 5", "Asexual 6", "Asexual 7", "Asexual 8",  "Asexual 9", "Asexual 10", "Asexual 11", "Asexual 12", "Asexual 13", "Asexual 14", "Asexual 15", "Asexual 16", "Asexual 17", "Progenitor", "Male 1", "Male 2", "Female 1", "Female 2", "Female 3"), values = pal_plot) +
  theme_pubr() +
  theme(legend.text=element_text(size=10), legend.position="bottom") +
  scale_x_discrete(labels= rev(c("10x", "Smart-seq2"))) +
  coord_flip() +
  guides(fill=guide_legend(ncol=9))


plot_prop
```
```{r}
ggsave("../images_to_export/WT_cell_type_proportions.png", plot = plot_prop, device = "png", path = NULL, scale = 1, width = 30, height = 9, units = "cm", dpi = 300, limitsize = TRUE)
```

### dotplot

A dotplot allows us to look at the expression of multiple genes in a clearer way than succesive UMAP plots

```{r, fig.width = 9, fig.height= 8}
### Data set-up

# PBANKA-0828000         GCSKO-3  GD1

# PBANKA-1302700       GCSKO-oom  MD1 
# PBANKA-1447900        GCSKO-29  MD2
# PBANKA-0413400    GCSKO-10_820  MD3
# PBANKA-0716500        GCSKO-19  MD4 
# PBANKA-0102400         GCSKO-2  MD5 

# PBANKA-1454800        GCSKO-21  FD1
# PBANKA-0902300        GCSKO-13  FD2
# PBANKA-1418100        GCSKO-17  FD3   
# PBANKA-1435200        GCSKO-20  FD4 

# PBANKA-1437500 - AP2G - commitment
# PBANKA-1319500 - CCP2 - female - used in 820 line
# PBANKA-0416100 - MG1 - dynenin heavy chain - male - used in 820 line
# PBANKA-0831000 - MSP1 - late asexual
# PBANKA-1102200 - MSP8 - early asexual (from Bozdech paper)

marker_genes_list <- c("PBANKA-1437500", "PBANKA-1319500", "PBANKA-0416100", "PBANKA-0831000", "PBANKA-1102200")
mutant_genes_list <- c("PBANKA-0828000", "PBANKA-1302700", "PBANKA-1447900", "PBANKA-0413400", "PBANKA-0716500", "PBANKA-0102400",  "PBANKA-1454800", "PBANKA-0902300", "PBANKA-1418100", "PBANKA-1435200")

## these get defined later on, but are replicated above here for plotting
asexual_early_clusters <- c(9, 4, 15, 8, 1, 14, 2, 10, 3, 0, 6, 5)
asexual_late_clusters <- c(7, 12, 18, 20, 23)
bipotentional_early_clusters <- # 0?
bipotential_clusters <- c(11) 
male_clusters <- c(16, 13)
female_clusters <- c(21, 22, 17, 19)

## copy the clusters so you don't permanently edit the master
tenx.justwt.integrated@meta.data$seurat_clusters_dot_plotting <- tenx.justwt.integrated@meta.data$seurat_clusters

## reorder the levels so you can plot the cluters as you wish
my_levels <- c(asexual_early_clusters,asexual_late_clusters, bipotential_clusters, male_clusters, female_clusters)

## reorder the levels
tenx.justwt.integrated@meta.data$seurat_clusters_dot_plotting <- factor(x = tenx.justwt.integrated@meta.data$seurat_clusters_dot_plotting, levels = my_levels)

## rename clusters so that they are intuitive
# this trick is from here: http://www.cookbook-r.com/Manipulating_data/Renaming_levels_of_a_factor/
levels(tenx.justwt.integrated@meta.data$seurat_clusters_dot_plotting) <- list(Asexual_1="9", 
                                                                              Asexual_2="4", 
                                                                              Asexual_3="15", 
                                                                              Asexual_4 = "8", 
                                                                              Asexual_5 = "1",
                                                                              Asexual_6=  "14", 
                                                                              Asexual_7 = "2", 
                                                                              Asexual_8 = "10",
                                                                              Asexual_9 = "3",
                                                                              Asexual_10 = "0",
                                                                              Asexual_11 = "6",
                                                                              Asexual_12 = "5",
                                                                              Asexual_13 = "7",
                                                                              Asexual_14 = "12",
                                                                              Asexual_15 = "18",
                                                                              Asexual_16 = "20",
                                                                              Asexual_17 = "23",
                                                                              Progenitor = "11",
                                                                              Male_1 = "16",
                                                                              Male_2 = "13",
                                                                              Female_1 = "21",
                                                                              Female_2 = "22",
                                                                              Female_3 = "19",
                                                                              Female_3 = "17"
                                                                              )

### Annotation set-up

## extract pseudotime numbers and identity of cells to a dataframe
df_pt_id <- tenx.justwt.integrated@meta.data[,c("old_pt_values", "monocle_sex", "seurat_clusters_dot_plotting", "Prediction.Spearman.", "Prediction.Spearman._Kasia")]

## make a new column
df_pt_id$colour <- NA

## assign bins to each of the values
## help here: https://stackoverflow.com/questions/9946630/colour-points-in-a-plot-differently-depending-on-a-vector-of-values 
df_pt_id[df_pt_id$monocle_sex == "Asexual_Early" | df_pt_id$monocle_sex == "Asexual_Late", ]$colour <- as.numeric(cut(df_pt_id[df_pt_id$monocle_sex == "Asexual_Early" | df_pt_id$monocle_sex == "Asexual_Late", ]$old_pt_values,breaks = 100))

df_pt_id[df_pt_id$monocle_sex == "Male", ]$colour <- as.numeric(cut(df_pt_id[df_pt_id$monocle_sex == "Male", ]$old_pt_values,breaks = 100))

df_pt_id[df_pt_id$monocle_sex == "Female", ]$colour <- as.numeric(cut(df_pt_id[df_pt_id$monocle_sex == "Female", ]$old_pt_values,breaks = 100))

df_pt_id[df_pt_id$monocle_sex == "Bipotential", ]$colour <- as.numeric(cut(df_pt_id[df_pt_id$monocle_sex == "Bipotential", ]$old_pt_values,breaks = 100))

## make colour ramps
asex_ramp <- colorRampPalette(c("#D5E3F5", "#0052c5"))
male_ramp <- colorRampPalette(c("white", "yellow", "#016c00"))
female_ramp <- colorRampPalette(c("yellow", "#a52b1e"))
bipot_ramp <- colorRampPalette(c("white", "#ffe400"))

## assign values to each cluster
## take the mean of the bin
df_annotation <- aggregate(df_pt_id[, "colour"], list(df_pt_id$seurat_clusters_dot_plotting), mean)
## BECAUSE we have ordered the clusters already, we can simply take the row index for this bit
df_annotation$colour <- NA
df_annotation[1:17, ]$colour <- asex_ramp(100)[df_annotation[1:17, ]$x]
df_annotation[18, ]$colour <- bipot_ramp(100)[df_annotation[18, ]$x]
df_annotation[19:20, ]$colour <- male_ramp(100)[df_annotation[19:20, ]$x]
df_annotation[21:23, ]$colour <- female_ramp(100)[df_annotation[21:23, ]$x]

## plot annotation 
## this really helped: https://www.biostars.org/p/396810/
h2 <- ggplot(df_annotation)+
  geom_bar(mapping = aes(x = Group.1, y = 1), 
           stat = "identity",
           fill = df_annotation$colour,
           col = "#FFFFFF",
           width = 1)+
      theme_void()+
      theme(panel.spacing.x = unit(1, "mm"))#+
     #facet_grid(.~colour, scales = "free_x")
     #legend <- plot_grid(get_legend(h2), get_legend(h1), ncol = 1)
 h2 <- h2 + theme(legend.position = "none")
 #   geom_point(col = df_umap_plot$colour)
 
 h1 <- ggplot(df_annotation)+
  geom_point(mapping = aes(x = Group.1, y = 1),
            col = df_annotation$colour,
            size = 5)+
      theme_void()+
      theme(panel.spacing.x = unit(1, "mm"))#+
     #facet_grid(.~colour, scales = "free_x")
     #legend <- plot_grid(get_legend(h2), get_legend(h1), ncol = 1)
 h1 <- h1 + theme(legend.position = "none")
 #   geom_point(col = df_umap_plot$colour)

 ## add predicted time point
 ## take the mean of the bin
df_annotation <- aggregate(df_pt_id[, "Prediction.Spearman."], list(df_pt_id$seurat_clusters_dot_plotting), mean)
df_annotation$colour <- NA
df_annotation$colour <- viridis(300)[(df_annotation$x)*10]

h3 <- ggplot(df_annotation)+
  geom_bar(mapping = aes(x = Group.1, y = 1), 
           stat = "identity",
           fill = df_annotation$colour,
           col = "#FFFFFF",
           width = 1)+
      theme_void()+
      theme(panel.spacing.x = unit(1, "mm"))#+
     #facet_grid(.~colour, scales = "free_x")
     legend_h3 <- get_legend(h3)
 h3 <- h3 + theme(legend.position = "none")

  h3 <- ggplot(data = df_annotation, 
              mapping = aes(x = Group.1, fill=x)
              ) +
       geom_bar(col = "#FFFFFF", width = 1)+
       theme_void() +
       theme(panel.spacing.x = unit(1, "mm")) +
       scale_fill_viridis(option = 'cividis')
 legend_h3 <- get_legend(h3)
 h3 <- h3 + theme(legend.position = "none")
 
 ## add kasia data
  ## take the mean of the bin
df_annotation <- aggregate(df_pt_id[, "Prediction.Spearman._Kasia"], list(df_pt_id$seurat_clusters_dot_plotting), mean)
df_annotation$colour <- NA
df_annotation$colour <- viridis(300)[(df_annotation$x)*10]

h4 <- ggplot(df_annotation)+
  geom_bar(mapping = aes(x = Group.1, y = 1), 
           stat = "identity",
           fill = df_annotation$colour,
           col = "#FFFFFF",
           width = 1)+
      theme_void()+
      theme(panel.spacing.x = unit(1, "mm"))+
     facet_grid(.~colour, scales = "free_x")
     legend <- plot_grid(get_legend(h4), ncol = 1)
    legend_h4 <- get_legend(h4)
 h4 <- h4 + theme(legend.position = "none")

 h4 <- ggplot(data = df_annotation, 
              mapping = aes(x = Group.1, fill=x)
              ) +
       geom_bar(col = "#FFFFFF", width = 1)+
       theme_void() +
       theme(panel.spacing.x = unit(1, "mm")) +
       scale_fill_viridis()
 legend_h4 <- get_legend(h4)
 h4 <- h4 + theme(legend.position = "none")

### Plot
dot_plot_markers <- DotPlot(tenx.justwt.integrated, 
                            features = c(marker_genes_list, rev(mutant_genes_list)), 
                            group.by = "seurat_clusters_dot_plotting", 
                            dot.min = 0.00001,
                            assay = 'RNA') +
  theme_classic() +
  coord_fixed() +
  coord_flip() +
  # change appearance and remove axis elements, and make room for arrows
  theme(axis.text.x = element_text(size=16, angle = 45, hjust=1,vjust=1, family = "Arial"), 
        axis.text.y = element_text(size=16, face="italic"), 
        text=element_text(size=16, family="Arial", colour="black"), 
        legend.position = "bottom", 
        legend.direction = "horizontal", 
        legend.box = "vertical", 
        plot.title = element_blank(), 
        plot.margin = unit(c(1,3,1,3), "lines"), 
        axis.line = element_blank(),
        panel.border = element_rect(colour = "black", fill=NA, size=0.5)) +
  #change the colours
  #scale_colour_viridis(option = "inferno", guide = "colourbar", na.value="white", begin = 0, end = 1, direction = 1) +
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## change x axis label
  # labs(x = "Marker Genes", y = "Cluster", title = "Expression of Marker Genes by Cluster") 
  labs(x = "", y = "", title = "") +
  ## add arrows
  #annotate("segment", x = 5.5, xend = 5.5, y = 21.5, yend = 25, colour = "green", size=1, alpha=1, arrow=arrow(length=unit(0.30,"cm"), type = "closed")) +
  #annotate("segment", x = 5.5, xend = 5.5, y = 16.5, yend = 21.5, colour = "red", size=1, alpha=1, arrow=arrow(length=unit(0.30,"cm"), type = "closed")) +
  #annotate("segment", x = 5.5, xend = 5.5, y = 0, yend = 15.5, colour = "grey", size=1, alpha=1, arrow=arrow(length=unit(0.30,"cm"), type = "closed")) +
  ## annotate asex
  geom_hline(aes(yintercept = (length(c(asexual_early_clusters, asexual_late_clusters))+0.5)), size = 0.5, linetype= 'dashed') +
  ## annotate bipotential
  geom_hline(aes(yintercept = (length(c(asexual_early_clusters, asexual_late_clusters, bipotential_clusters))+0.5)), size = 0.5, linetype= 'dashed') +
  ## annotate sexes
  geom_hline(aes(yintercept = (length(c(asexual_early_clusters, asexual_late_clusters, bipotential_clusters, male_clusters))+0.5)), size = 0.5, linetype= 'dashed') +
  ## change label on bottom of plot so we can indicate markers
  scale_x_discrete(labels = rev(c("gd1", "md1", "md2", "md3", "md4", "md5", "fd1", "fd2", "fd3", "fd4", "msp8", "msp1", "mg1", "ccp2", "ap2g"))) +
  ## change label on bottom of plot so we can indicate markers
  scale_y_discrete(labels = c("Asexual 1", "Asexual 2" , "Asexual 3", "Asexual 4", "Asexual 5", "Asexual 6", "Asexual 7", "Asexual 8",  "Asexual 9", "Asexual 10", "Asexual 11", "Asexual 12", "Asexual 13", "Asexual 14", "Asexual 15", "Asexual 16", "Asexual 17", "Progenitor", "Male 1", "Male 2", "Female 1", "Female 2", "Female 3")) +
  ## change name of legends
  guides(col=guide_colorbar(title = 'Scaled Average Expression'),
         size=guide_legend("% of cells expressing"))

## view
#print(dot_plot_markers)

plot <- plot_grid(h4, h3, h1, dot_plot_markers, align = "v", ncol = 1, axis = "tb", rel_heights = c(0.5, 0.5, 0.5, 18)) 

dot_plot <- plot_grid(plot, legend_h3, legend_h4, nrow = 1, rel_widths = c(10, 1.5, 1.5))

dot_plot
```

save
```{r}
ggsave("../images_to_export/dot_plot_all.png", plot = dot_plot, device = "png", path = NULL, scale = 1, width = 29.7, height = 21, units = "cm", dpi = 300, limitsize = TRUE)
```

Make individual plots highlighting where cells in each cluster fall
```{r, echo = FALSE, message=FALSE}
## for loop which takes each cluster and makes a list of cells and then plots a highlighted plot and adds it to a list

## make a blank list
list_UMAPs_by_cluster <- vector(mode = "list", length = length(levels(tenx.justwt.integrated@meta.data$seurat_clusters_dot_plotting)))

## for loop
for(i in seq_along(levels(tenx.justwt.integrated@meta.data$seurat_clusters_dot_plotting))){
  ## make a list of cells
  list_of_cells <- rownames(tenx.justwt.integrated@meta.data[which(tenx.justwt.integrated@meta.data$seurat_clusters_dot_plotting == levels(tenx.justwt.integrated@meta.data$seurat_clusters_dot_plotting)[i]), ])
  umap_plot <- DimPlot(tenx.justwt.integrated, label = FALSE, repel = TRUE, pt.size = 0.1, cells.highlight = list_of_cells, dims = c(1,2), reduction = "umap") +
    ## fix coordinates
    coord_fixed() + 
  scale_color_manual(values=c("#D3D3D3", "#1D1564")) + 
  theme_void() + 
  labs(title = paste(levels(tenx.justwt.integrated@meta.data$seurat_clusters_dot_plotting)[i])) + 
  theme(plot.title = element_text(hjust = 0.5), legend.position = "none")
  ## add to the list
  list_UMAPs_by_cluster[[i]] <- umap_plot
}

## check number of clusters
length(list_UMAPs_by_cluster)
```

plot
```{r, fig.height = 10, fig.width = 10}
## this function writes the next bit of code for you
## put it into the console and paste the response
#ploty <- c()
#for(i in seq_along(levels(tenx.justwt.integrated@meta.data$seurat_clusters))){
#  ploty <- paste0(ploty, "list_UMAPs_by_cluster[[", i, "]]", " + ")
#}

## plot
composite_plot <- plot_grid(list_UMAPs_by_cluster[[1]], list_UMAPs_by_cluster[[2]], list_UMAPs_by_cluster[[3]], list_UMAPs_by_cluster[[4]], list_UMAPs_by_cluster[[5]], list_UMAPs_by_cluster[[6]], list_UMAPs_by_cluster[[7]], list_UMAPs_by_cluster[[8]], list_UMAPs_by_cluster[[9]], list_UMAPs_by_cluster[[10]], list_UMAPs_by_cluster[[11]], list_UMAPs_by_cluster[[12]], list_UMAPs_by_cluster[[13]], list_UMAPs_by_cluster[[14]], list_UMAPs_by_cluster[[15]], list_UMAPs_by_cluster[[16]], list_UMAPs_by_cluster[[17]], list_UMAPs_by_cluster[[18]], list_UMAPs_by_cluster[[19]], list_UMAPs_by_cluster[[20]], list_UMAPs_by_cluster[[21]], list_UMAPs_by_cluster[[22]], list_UMAPs_by_cluster[[23]], ncol = 8)

composite_plot
```

save
```{r}
ggsave("../images_to_export/clusters_composite_plot.png", plot = composite_plot, device = "png", path = NULL, scale = 1, width = 21, height = 10, units = "cm", dpi = 300, limitsize = TRUE)
```

# 9. Subset sexual cells {.tabset}

Make a subsetted Seurat object of sexual cells. 

Include the pre-branch too as well as any weird clusters that may have clustered out. 

it's been a while since we looked at the clusters so let's check them out again:
```{r, fig.height = 10, fig.width = 10}
## Plot
DimPlot(tenx.justwt.integrated, label = TRUE, repel = FALSE, pt.size = 0.05, group.by = "seurat_clusters", dims = c(1,2), reduction = "umap") + coord_fixed()

## plot
list_UMAPs_by_cluster[[1]] + list_UMAPs_by_cluster[[2]] + list_UMAPs_by_cluster[[3]] + list_UMAPs_by_cluster[[4]] + list_UMAPs_by_cluster[[5]] + list_UMAPs_by_cluster[[6]] + list_UMAPs_by_cluster[[7]] + list_UMAPs_by_cluster[[8]] + list_UMAPs_by_cluster[[9]] + list_UMAPs_by_cluster[[10]] + list_UMAPs_by_cluster[[11]] + list_UMAPs_by_cluster[[12]] + list_UMAPs_by_cluster[[13]] + list_UMAPs_by_cluster[[14]] + list_UMAPs_by_cluster[[15]] + list_UMAPs_by_cluster[[16]] + list_UMAPs_by_cluster[[17]] + list_UMAPs_by_cluster[[18]] + list_UMAPs_by_cluster[[19]] + list_UMAPs_by_cluster[[20]] + list_UMAPs_by_cluster[[21]] + list_UMAPs_by_cluster[[22]] + list_UMAPs_by_cluster[[23]] + list_UMAPs_by_cluster[[24]]
```

### Define cells and subset
```{r}
asexual_early_clusters <- c(9, 4, 15, 8, 1, 14, 2, 10, 3, 0, 6, 5)
asexual_late_clusters <- c(7, 12, 18, 20, 23)
bipotentional_early_clusters <- # 0?
bipotential_clusters <- c(11) 
male_clusters <- c(16, 13)
female_clusters <- c(21, 22, 17, 19)

## define cells
cell_names_subset_monocle_ids <- rownames(tenx.justwt.integrated@meta.data[tenx.justwt.integrated@meta.data$monocle_sex %in% c("Asexual_Early", "Bipotential", "Male", "Female"), ])

## 3, 0, 6, 5 are early clusters of asexuals before the branch
cell_names_subset_cluster_ids <- rownames(tenx.justwt.integrated@meta.data[tenx.justwt.integrated@meta.data$seurat_clusters %in% c(male_clusters, female_clusters, bipotential_clusters, 3, 0, 6, 5), ])

cell_names_subset_intersect <- intersect(cell_names_subset_monocle_ids, cell_names_subset_cluster_ids)

## subset cells into new object
tenx.justwt.integrated.sex <- subset(tenx.justwt.integrated, cells = cell_names_subset_intersect)
```

### inspect/check
```{r}
## inspect object
tenx.justwt.integrated.sex

## look at original UMAP
DimPlot(tenx.justwt.integrated.sex, label = TRUE, repel = TRUE, pt.size = 0.1, split.by = "experiment", dims = c(1,2), reduction = "umap") + coord_fixed()
```

### Remove contaminant asexual cells

we want to remove:
```{r}
## look at original UMAP
plot_sexual_subsetting <- DimPlot(tenx.justwt.integrated.sex, label = TRUE, repel = TRUE, pt.size = 0.1, dims = c(1,2), reduction = "umap") + 
  coord_fixed() + 
  geom_vline(aes(xintercept = -1, alpha = 5))

plot_sexual_subsetting
```

```{r}
## extract cell embeddings
df_sex_cell_embeddings <- as.data.frame(tenx.justwt.integrated.sex@reductions[["umap"]]@cell.embeddings)

## subset anything lower than -0.8 in UMAP 2 and -0.1 in UMAP 1
remove_cells <- row.names(df_sex_cell_embeddings[which(df_sex_cell_embeddings$UMAP_1 < -1), ])

## plot these cells
DimPlot(tenx.justwt.integrated.sex, label = FALSE, repel = TRUE, pt.size = 0.1, cells.highlight = remove_cells, dims = c(1,2), reduction = "umap") + 
  coord_fixed() + 
  scale_color_manual(values=c("#000000", "#f54e1e")) + 
  theme_void() + 
  labs(title = paste("cells highlighted will be removed")) + 
  theme(plot.title = element_text(hjust = 0.5), legend.position = "none")
```

## Final Subset
```{r}
## make keep cells from the remove_cells
## make the not in function
'%ni%' <- Negate('%in%')
keep_cells <- colnames(tenx.justwt.integrated.sex)[which(colnames(tenx.justwt.integrated.sex) %ni% remove_cells)]

## subset
tenx.justwt.integrated.sex <- subset(tenx.justwt.integrated.sex, cells = keep_cells)

## inspect
tenx.justwt.integrated.sex
```

copy old clusters over
```{r}
## copy old clusters
tenx.justwt.integrated.sex <- AddMetaData(tenx.justwt.integrated.sex, tenx.justwt.integrated.sex@meta.data$seurat_clusters, col.name = "post_integration_clusters")
```

# 10. Sex assignment of mutants 

## A. Seurat Method

```{r}
## find transfer anchors
DefaultAssay(tenx.justwt.integrated) <- "integrated"
merge.anchors <- FindTransferAnchors(reference = tenx.justwt.integrated, query = GCSKO_mutants, 
    dims = 1:30)

## transfer data between ref and query
predictions <- TransferData(anchorset = merge.anchors, refdata = tenx.justwt.integrated$seurat_clusters_dot_plotting, 
    dims = 1:30)

## add meta data to object
GCSKO_mutants <- AddMetaData(GCSKO_mutants, metadata = predictions)

## new object from this
mutant_seurat <- GCSKO_mutants

## look at breakdown of mutant by designation
df <- as.data.frame(mutant_seurat@meta.data)
table(df$predicted.id, df$identity_name_updated)
```

```{r}
## see how this overlaps with fluorescence sorted on for 820 to confirm accuracy
df_820 <- df[df$genetic_background =="PBANKA_820", ]
table(df_820$predicted.id, df_820$fluoresence_sorted_on)
```

inspect gd1
```{r}
df[which(df$predicted.id == "Female_1" & df$identity_name_updated == "gd1"), ][, 118:146]
```


Calculate sex ratios
```{r}
## use designations above for sexes
male_cells <- rownames(mutant_seurat@meta.data[mutant_seurat@meta.data$predicted.id == "Male", ])
female_cells <- rownames(mutant_seurat@meta.data[mutant_seurat@meta.data$predicted.id == "Female", ])
ss2_mutants_final_male <- subset(mutant_seurat, cells = male_cells)
ss2_mutants_final_female <- subset(mutant_seurat, cells = female_cells)

## inspect
ss2_mutants_final_male
ss2_mutants_final_female
```

```{r}
## calculate sex ratios
##subset out H, sorted cells:
df_male <- ss2_mutants_final_male@meta.data[ss2_mutants_final_male@meta.data$exclude_for_sex_ratio == FALSE,]

dim(df_male)

df_female <- ss2_mutants_final_female@meta.data[ss2_mutants_final_female@meta.data$exclude_for_sex_ratio == FALSE,]

dim(df_female)

## make dataframe
df_sex_ratio <- merge(
  as.data.frame(table(df_male$sub_name_updated)), 
  as.data.frame(table(df_female$sub_name_updated)), 
  by = "Var1", all=TRUE)

# or use identity_updated

## add names
names(df_sex_ratio) <- c("genotype", "male", "female")

## change the NAs to 0
df_sex_ratio[is.na(df_sex_ratio)] <- 0

## collapse 820 wild-types together
combined_m <- df_sex_ratio[df_sex_ratio$genotype == "WT-820_3_5", ]$male + df_sex_ratio[df_sex_ratio$genotype == "WT-820", ]$male
combined_f <- df_sex_ratio[df_sex_ratio$genotype == "WT-820_3_5", ]$female + df_sex_ratio[df_sex_ratio$genotype == "WT-820", ]$female
df_sex_ratio <- rbind(df_sex_ratio, c("WT-820-combined", combined_m, combined_f))
# remove old rows
df_sex_ratio <- df_sex_ratio[-which(df_sex_ratio$genotype == "WT-820_3_5" | df_sex_ratio$genotype == "WT-820"), ]
# need to make numeric again
df_sex_ratio$male <- as.numeric(df_sex_ratio$male)
df_sex_ratio$female <- as.numeric(df_sex_ratio$female)
df_sex_ratio$genotype <- as.character(df_sex_ratio$genotype)
# add name for WT combined
df_sex_ratio$genotype[17] <- "WT-820-combined"

## calculate sex ratio
df_sex_ratio$sex_ratio <- (df_sex_ratio$male + 0.1)/(df_sex_ratio$female + 0.1)

## log sex ratio
df_sex_ratio$sex_ratio_log <- log10(df_sex_ratio$sex_ratio)

##view
df_sex_ratio
```

```{r}
## remove WT-2 because it is really inappropriate to have a sex ratio for this:
df_sex_ratio <- df_sex_ratio[-which(df_sex_ratio$genotype == "WT-md5"),]
```

plot
```{r, fig.height = 7, fig.width = 6}
library(latex2exp) # so you can plot the fraction, it's on CRAN `install.packages("latex2exp")`

## make extra column for plotting aesthetics:
df_sex_ratio$above <- df_sex_ratio$sex_ratio_log > 0

## make extra column with ratio in it:
df_sex_ratio$genotype_with_n <- paste0(df_sex_ratio$genotype, " (", df_sex_ratio$male, "/", df_sex_ratio$female, ")")

## reorder genotype so it is in the correct order for plotting
df_sex_ratio$genotype_with_n <- factor(df_sex_ratio$genotype_with_n, levels = df_sex_ratio$genotype_with_n[order(df_sex_ratio$sex_ratio_log)])

## plot
sex_ratio_plot <- ggplot(df_sex_ratio, aes(sex_ratio_log, genotype_with_n, color = above)) +
      ## add the lines for the lollipop plot
      geom_segment(aes(x = 0, y = genotype_with_n, xend = sex_ratio_log, yend = genotype_with_n), color = "grey50") +
      ## add the points for the lollipop plot
      geom_point(aes(size = 4)) +
      ## add the wild-type rectangle
      annotate("rect", xmin= -0.23182478, xmax = 1.00105797, ymin=-Inf , ymax=Inf, alpha=0.4, color=NA,linetype = 2, fill="#999999") +
      ## make prettier
      theme_classic() +
      theme(legend.position = "none", text=element_text(size=16, family="Arial")) + 
      ## change axis labels
      labs(y = TeX("$Genotype\ \\left(\\frac{n_{male}}{n_{female}}\\right)$")) +
      #labs(y = expression(paste("Genotype", group("(", frac(paste("n male"), "n female"), ")")))) +
      xlab(expression(paste("Sex Ratio (", log[10], 
                               group("(",
                                      frac(paste("n male + 0.1"), 
                                           paste("n female + 0.1")),
                                   ")"), ")" ))) +
      ## change colours of lollipops
      scale_colour_manual(values = c("#a52b1e", "#016c00")) +
      ## annotate phenotypes
      geom_hline(aes(yintercept = 5.5)) #+
      #geom_hline(aes(yintercept = 14.5))

print(sex_ratio_plot)

### CHANGE TO BIG BRACKET HERE: https://www.overleaf.com/learn/latex/Brackets_and_Parentheses 

#paste("Sex Ratio", "\n", "log10((n male + 0.1)/(n female + 0.1))")
#theme(legend.position = "none", text=element_text(size=16, family="Arial"))

## fraction titles: https://groups.google.com/forum/#!topic/ggplot2/bgNRnZ82hJY 
## https://uc-r.github.io/lollipop

#ggsave(filename = "../images_to_export/sex_ratio_plot.png", device = "png", width = 7, height = 7, units = "in")
```

## B. SCMAP Method

### Build the index

```{r}
### Making an ortholog reference index

## load in mca data
#counts <- read.csv("../scmap/allpb10x_counts.csv", row.names = 1)
#pheno <- read.csv("../scmap/allpb10x_pheno.csv")
#ggplot(pheno, aes(x=PC2_3d, y = PC3_3d,colour=absclust3)) + geom_point()

## load required libraries
library(scmap) #https://bioconductor.org/packages/release/bioc/html/scmap.html 
library(SingleCellExperiment) #

#prep the SCE, if was originally a Suerat object need the dfs to be regular matrices
#pb_filtered_sce_orth <- pb_filtered_sce_orth[, colData(pb_filtered_sce_orth)$absclust3 != "8"]
#sce <- pb_filtered_sce_orth
#pca <- plotPCA(sce)
#pcs <- pca$data
#table(rownames(pcs)==colnames(sce))
#colData(sce) <- cbind(colData(sce), pcs)
#rowData(sce)$feature_symbol <- rowData(sce)$gene

## extract data from Seurat
#cells_tenx <- rownames(tenx.justwt.integrated@meta.data[which(tenx.justwt.integrated@meta.data$experiment == "tenx_5k"), ])
#tenx.justwt.integrated.10k <- subset(tenx.justwt.integrated, cells = cells_tenx)
counts = as.matrix(GetAssayData(tenx.justwt.integrated, slot = "counts", assay = "RNA"))
pheno = as.data.frame(tenx.justwt.integrated@meta.data)

## add UMAP coordinates
df_sex_cell_embeddings <- as.data.frame(tenx.justwt.integrated@reductions[["umap"]]@cell.embeddings)
pheno <- cbind(pheno, df_sex_cell_embeddings)

## Set up object
sce <- SingleCellExperiment(list(counts=counts),
    colData=DataFrame(label=pheno),
    rowData=DataFrame(feature_symbol=rownames(counts)))
sce

## manual logging
#counts_1 <- assay(sce, "counts")
#libsizes <- colSums(counts_1)
#size.factors <- libsizes/mean(libsizes)
#logcounts(sce) <- log2(t(t(counts_1)/size.factors) + 1)
#counts(sce) <- as.matrix(counts(sce))
#logcounts(sce) <- as.matrix(logcounts(sce))
logcounts(sce) <- as.matrix(GetAssayData(tenx.justwt.integrated, slot = "data", assay = "RNA"))

## remove features with duplicated names
sce <- sce[!duplicated(rownames(sce)), ]

## build scmap-cell reference index, save this rds
sce <- selectFeatures(sce, suppress_plot = FALSE, n_features = 2000)
table(rowData(sce)$scmap_features)
set.seed(1)
sce <- indexCell(sce, M = 50, k = 80)
names(metadata(sce)$scmap_cell_index)
#length(metadata(sce)$scmap_cell_index$subcentroids)
#dim(metadata(sce)$scmap_cell_index$subcentroids[[1]])
#metadata(sce)$scmap_cell_index$subcentroids[[1]][,1:5]
#dim(metadata(sce)$scmap_cell_index$subclusters)

#saveRDS(pb_filtered_sce_orth, file="pb_filtered_sce_orthindex_20181109.rds")
```

### Map to the index

```{r}
#rowData(pb_filtered_sce_orth)$feature_symbol <- rowData(pb_filtered_sce_orth)$orth_name
#rownames(pb_filtered_sce_orth) <- rowData(pb_filtered_sce_orth)$orth_name

#prep the SCE, if was originally a Suerat object need the dfs to be regular matrices
#pb_filtered_sce_orth <- pb_filtered_sce_orth[, colData(pb_filtered_sce_orth)$absclust3 != "8"]
#sce <- pb_filtered_sce_orth
#pca <- plotPCA(sce)
#pcs <- pca$data
#table(rownames(pcs)==colnames(sce))

#colData(sce) <- cbind(colData(sce), pcs)

mutants_counts = as.matrix(GetAssayData(mutant_seurat, slot = "counts", assay = "RNA"))
mutants_pheno = as.data.frame(mutant_seurat@meta.data)

#rowData(sce)$feature_symbol <- rowData(sce)$gene
mutants.sce <- SingleCellExperiment(list(counts=mutants_counts),
    colData=DataFrame(label=mutants_pheno),
    rowData=DataFrame(feature_symbol=rownames(mutants_counts)))
mutants.sce

## add log counts
## manual
#counts_1 <- assay(mutants.sce, "counts")
#libsizes <- colSums(counts_1)
#size.factors <- libsizes/mean(libsizes)
#logcounts(mutants.sce) <- log2(t(t(counts_1)/size.factors) + 1)
## from seurat
logcounts(mutants.sce) <- as.matrix(GetAssayData(mutant_seurat, slot = "data", assay = "RNA"))

#Project query data set onto cell index
scmapCell_results <- scmapCell(
  mutants.sce, 
  list(wt = metadata(sce)$scmap_cell_index
  )
)

##Look into the results
# For each dataset there are two matricies. cells matrix contains the top 10 (scmap default) cell IDs of the cells of the reference dataset that a given cell of the projection dataset is closest to:
#   
#   Give assignments in two ways:
#   1. Take the top cell assignment abs clust, if cosine similarity is less than 0.4 (or adjust if needed) mark as unassigned
# 2. For the top 3 nearest neighbors, get a mean of the PCA coordinates and snap to the nearest cell of those coordinates. If any of the top three cells are sim below 0.4 then mark as unassigned.


##Top cell assignment method

## look at assignments
scmapCell_results$wt$cells[, 1:3]
## get cell indexes 
getcells <- scmapCell_results$wt$cells[1, ]
## exctract cells metadata from index
cdsce <- colData(sce)[getcells, ]
## extract similarities
topsim <- scmapCell_results$wt$similarities[1, ]
## get name of top cell matched
mutants.sce$topcell <- rownames(cdsce)
## get sex id
mutants.sce$topcell_ac <- cdsce$label.monocle_sex
## get coordinates of matched cell
mutants.sce$indexPC1 <- cdsce$label.UMAP_1
mutants.sce$indexPC2 <- cdsce$label.UMAP_2
#mutants.sce$pbpt <- cdsce$pseudotime
#mutants.sce$pbbulk <- cdsce$bulk
mutants.sce$topcell_sp <- mutants.sce$topcell_ac
mutants.sce$topsim <- topsim
mutants.sce$topcell_sp[mutants.sce$topsim < 0.4] <- "unassigned"
table(mutants.sce$topcell_sp)
```

```{r}
## see how this overlaps with fluorescence sorted on for 820 to confirm accuracy
df_820 <- colData(mutants.sce)[colData(mutants.sce)$label.genetic_background =="PBANKA_820", ]
table(df_820$topcell_sp, df_820$label.fluoresence_sorted_on)
```

```{r}
table(colData(mutants.sce)$topcell_sp, colData(mutants.sce)$label.identity_name_updated)
```


```{r}
#### TOP 3NN method

#This function makes a list of the PC means for each cell and then do.call below rbinds them into a dataframe called big_data

datalist = list()

for (i in colnames(scmapCell_results$wt$cells)) {
  
  getcellstest <- scmapCell_results$wt$cells[1:3, i]
  cdscetest <- colData(sce)[getcellstest, ]
  PC1mean <- mean(cdscetest$label.UMAP_1)
  PC2mean <- mean(cdscetest$label.UMAP_2)
  # ... make some data
  dat <- data.frame(i, PC1mean, PC2mean)
  dat$i <- i  # maybe you want to keep track of which iteration produced it?
  datalist[[i]] <- dat # add it to your list
}

big_data = do.call(rbind, datalist)
# or big_data <- dplyr::bind_rows(datalist)
# or big_data <- data.table::rbindlist(datalist)

test <- big_data[1, ]

df <- data.frame(X=colData(sce)$label.UMAP_1, Y=colData(sce)$label.UMAP_2, row.names = rownames(colData(sce)))

#the snap function snaps to the nearest cell in PC coordiantes
snap <- function(df, test){
  require(Biobase)
  d <- matchpt(as.matrix(df),
               as.matrix(data.frame(X=test$PC1mean,Y=test$PC2mean)))
  
  min_row <- rownames(d[d$distance==min(d$distance),])
  
  test$X_snap <- unique(df[min_row,"X"])
  test$Y_snap <- unique(df[min_row,"Y"])
  test$pb_cell <- min_row
  
  test
}

#this loops through each cell and in big_data and runs the snap function
datalist2 = list()
colnames(big_data) <- c("sample_id", "PC1mean", "PC2mean")
for (i in rownames(big_data)) {
  test <- big_data[i, ]
  coord <- snap(df, test)
  coord$i <- i
  datalist2[[i]] <- coord
}
big_data2 = do.call(rbind, datalist2)


table(rownames(big_data2)==rownames(colData(mutants.sce)))

allpbcd <- colData(sce)
allpbcd <- as.data.frame(allpbcd)
pbabsclust <- allpbcd[, c("label.pt_id_cols", "label.identity_name_updated", "label.monocle_sex"), drop=FALSE]
pbabsclust$pb_sample_id <- rownames(pbabsclust)

# Now merge the pc cell asignments with their abs clust and get in the right order
big_data3 <- merge(big_data2, pbabsclust, by.x = "pb_cell", by.y = "pb_sample_id", all.x=TRUE, all.y=FALSE)
big_data4 <- big_data3[match(rownames(big_data2), big_data3$sample_id), ]


colors <- c("6"="#78C679",
            "2"="#D1EC9F",
            "0"="#FEB24C",
            "1"="#F4CF63",
            "3"="#FEEEAA",
            "4"="#85B1D3",
            "7"="#9ecae1",
            "5"="#C9E8F1",
            "M"= "#B7B7D8",
            "F"="#9C96C6",
            "unassigned"="black")

ggplot(big_data4, aes(PC1mean, PC2mean)) +
  geom_point(col = big_data4$label.pt_id_cols) +
                     theme_void() +
                     coord_fixed()

#+ geom_point(aes(colour=factor(big_data4$label.label.pt_id_cols))) + scale_color_manual(values = colors) 

                     

ggplot(big_data4, aes(X_snap, Y_snap)) +
  geom_point(col = big_data4$label.pt_id_cols) +
                     theme_void() +
                     coord_fixed() 

#+ geom_point(aes(colour=factor(big_data4$label.label.pt_id_cols))) + scale_color_manual(values = colors) 

##add info to SCE and save colData, to be an assigned cell all 3NN must have a cos sim >0.4
scmapCell_results$wt$similarities[, 1:3]
topsim1 <- scmapCell_results$wt$similarities[1, ]
topsim2 <- scmapCell_results$wt$similarities[2, ]
topsim3 <- scmapCell_results$wt$similarities[3, ]

table(big_data4$sample_id==rownames(colData(mutants.sce)))
#pfobj$pb_cell <- big_data4$pb_cell
#pfobj$PC1mean <- big_data4$PC1mean
bd4 <- big_data4[, c("pb_cell", "sample_id", "PC1mean", "PC2mean", "X_snap", "Y_snap", "label.pt_id_cols", "label.identity_name_updated", "label.monocle_sex")]

colData(mutants.sce) <- cbind(colData(mutants.sce), bd4)

mutants.sce$topsim1 <- topsim1
mutants.sce$topsim2 <- topsim2
mutants.sce$topsim3 <- topsim3
mutants.sce$stage_pred <- big_data4$label.monocle_sex
mutants.sce$stage_pred[mutants.sce$topsim1 < 0.4 | mutants.sce$topsim2 < 0.4 | mutants.sce$topsim3 < 0.4] <- "unassigned"
table(mutants.sce$stage_pred)

#write.csv(bd4, "pfcellassignmentswithmean3nn_20181029.csv")
#write.csv(colData(pfobj), "pf3d7100scmapclusts2methodindexn100_20190107.csv")
```

```{r}
## see how this overlaps with fluorescence sorted on for 820 to confirm accuracy
df_820 <- colData(mutants.sce)[colData(mutants.sce)$label.genetic_background =="PBANKA_820", ]
table(df_820$stage_pred, df_820$label.fluoresence_sorted_on)
```

```{r}
table(colData(mutants.sce)$stage_pred, colData(mutants.sce)$label.identity_name_updated)
```


## make final plot with MCA data below
```{r}
## Read in MCA data
pheno <- read.csv("../scmap/allpb10x_pheno.csv")
## extract rows needed for plotting
df_plot <- pheno[,c("PC2_3d", "PC3_3d", "absclust3")]
df_plot$experiment <- df_plot$absclust3

## extract rows needed for plotting from mapped object
df_plot_pm <- as.data.frame(colData(pm_ss2_field.sce.orth))
df_plot_pm <- df_plot_pm[ ,c("X_snap", "Y_snap", "label.absclust3")]
df_plot_pm$experiment <- "pm"
colnames(df_plot_pm) <- c("PC2_3d", "PC3_3d", "absclust3", "experiment")

## extract rows needed for plotting from mapped object
df_plot_pf <- as.data.frame(colData(pf_ss2_field.sce.orth))
df_plot_pf <- df_plot_pf[ ,c("X_snap", "Y_snap", "label.absclust3")]
df_plot_pf$experiment <- "pf"
colnames(df_plot_pf) <- c("PC2_3d", "PC3_3d", "absclust3", "experiment")

## bind together
df_plot <- rbind(df_plot, df_plot_pf, df_plot_pm)

colors <- c("6"="#78C679",
            "2"="#D1EC9F",
            "0"="#FEB24C",
            "1"="#F4CF63",
            "3"="#FEEEAA",
            "4"="#85B1D3",
            "7"="#9ecae1",
            "5"="#C9E8F1",
            "M"= "#B7B7D8",
            "F"="#9C96C6",
            "unassigned"="black",
            "pm" = "#dc65a4",
            "pf" = "#24245f")

## add alpha for plotting
df_plot$alpha[!df_plot$experiment == "pm"] <- 0.5
df_plot$alpha[df_plot$experiment == "pm"] <- 1
df_plot$alpha[df_plot$experiment == "pf"] <- 1

##plot
scmap_pca <- ggplot(df_plot, aes(x=PC2_3d, y = PC3_3d,colour=experiment, alpha = alpha)) + 
  geom_point() +
  theme_void() +
  scale_color_manual(values = colors) +
  ## remove alpha scale
  scale_alpha_continuous(guide=FALSE)
  ## draw ring around field samples
  #geom_point(data=df_plot[df_plot$experiment == "pm", ], pch=21, fill=NA, size=2, colour="black", stroke=1) +
  ## make non-field samples more opaque
  #geom_point(data=df_plot[-df_plot$experiment == "pm", ], alpha = 0.2)

## view
scmap_pca_2 <- scmap_pca + 
  guides(colour=guide_legend(override.aes = list(size=4)))

scmap_pca_2
```

save
```{r}
#ggsave("../images_to_export/field_samples_scmap.png", plot = scmap_pca_2, device = "png", path = NULL, scale = 1, width = 15, height = 10, units = "cm", dpi = 300, limitsize = TRUE)
```

## Overlap and validation

```{r}

```

```{r}

```

# 11. Save and Export {.tabset}

## save

Save environment
```{r}
## This saves everything in the global environment for easy recall later
#save.image(file = "GCSKO_merge.RData")
#load(file = "GCSKO_merge.RData")
```

Save object(s)
```{r}
## Save an object to a file
saveRDS(tenx.justwt.integrated.sex, file = "../data_to_export/tenx.justwt.integrated.sex.RDS")
## Restore the object
#readRDS(file = "../data_to_export/tenx.mutant.integrated.sex.RDS")

## save integrated object to file
saveRDS(tenx.justwt.integrated, file = "../data_to_export/tenx.justwt.integrated.RDS") 
## restore the object
#tenx.justwt.integrated <- readRDS("../data_to_export/tenx.justwt.integrated.RDS")

## save monocle object
saveRDS(monocle.object.all, file = "../data_to_export/monocle.object.all.RDS") 
## restore the object
#monocle.object.all <- readRDS("../data_to_export/monocle.object.all.RDS")

## save integrated object to file
#saveRDS(GCSKO_mutants, file = "../data_to_export/GCSKO_mutants.RDS") 
## restore the object
#tenx.justwt.integrated <- readRDS("../data_to_export/tenx.justwt.integrated.RDS")

## extract UMAP coordinates and metadata
## extract metadata
mca_website_df <- tenx.justwt.integrated@meta.data

## add umap - merge on row names and then make sure row names are not dropped or added
mca_website_df <- transform(merge(mca_website_df, tenx.justwt.integrated@reductions[["umap"]]@cell.embeddings, by=0, all=TRUE), row.names=Row.names, Row.names=NULL)

## send this to sunil:
# ggplot(mca_website_df, aes(x = UMAP_1, y = UMAP_2)) +
# geom_point(col = mca_website_df$pt_id_cols) +
# theme_void() +
# coord_fixed()

## export
write.csv(mca_website_df, file = "../data_to_export/mca_website_df.csv")
```

Clean up
```{r}
#rm(ss2_wt_cells)
#rm(tenx.justwt.integrated)
#rm(tenx.justwt.list)
```

## final figure construction

[Cowplot](https://wilkelab.org/cowplot/articles/plot_grid.html)(plot_grid), [patchwork](https://cran.r-project.org/web/packages/patchwork/vignettes/patchwork.html)(wrap_plots), and [ggpubr](http://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/81-ggplot2-easy-way-to-mix-multiple-graphs-on-the-same-page/) can all allow multiple plots to be plotted together. 

```{r, fig.height = 11.7, fig.width = 8.3}
## A
# umap_id_pt
## B
# marker gene expression
## C
# Mutant gene expression
## D
# Modules

#Figure_A <- grid.arrange(arrangeGrob(QC_composite_plot, QC_mito_violin, QC_mito_graph, QC_by_genotype, mapping_rate_plot), nrow=3), nrow=2, heights=c(10,2))

## cowplot method
## can use this for labels: toupper(letters)[1:10]

## C. Mutant genes
mutant_genes_figure <- plot_grid(
                                 ## marker genes starts
                                 marker_gene_plot_FAMB,
                                 marker_gene_plot_MSP8,
                                 marker_gene_plot_MSP1,
                                 marker_gene_plot_AP2G,
                                 marker_gene_plot_CCP2,
                                 marker_gene_plot_MG1,
                                 ## mutant genes starts  
                                 marker_gene_plot_gd1,
                                 marker_gene_plot_md1,
                                 marker_gene_plot_md2,
                                 marker_gene_plot_md3,
                                 marker_gene_plot_md4, 
                                 marker_gene_plot_md5,
                                 marker_gene_plot_fd1,
                                 marker_gene_plot_fd2,
                                 marker_gene_plot_fd3,
                                 marker_gene_plot_fd4, 
  label_size = 1, 
  nrow = 4)

## labels as uppercase letters:
#c(toupper(letters)[2:17])
# labels as roman numerals:
# c(tolower(as.roman(c(2:17))

Figure_publication <- plot_grid(umap_id_pt + theme(plot.margin = unit(c(0, 0, 0, 0), "cm")), 
                                mutant_genes_figure,
                                ## add empty plot to give spacing
                                ggplot() + theme_void(),
                                labels = c('A', 'B'), 
                                label_size = 12, 
                                ncol = 2, 
                                nrow=2, 
                                rel_heights = c(1, 1, 4), 
                                rel_widths = c(1, 2, 3))

Figure_publication
```

save
```{r}
ggsave("../images_to_export/Figure_C.png", plot = Figure_publication, device = "png", path = NULL, scale = 1, width = 21, height = 29.7, units = "cm", dpi = 300, limitsize = TRUE)
```

```{r, fig.height = 11.7, fig.width = 8.3}
layout <- "
AAABBB
CCDDEE
FFGGHH
IIJJKK
"

Figure_publication_A <- (composition_umap_10x + theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 10))) +
(composition_umap_ss2 + theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 10))) + 
plot_layout(ncol = 2)

Figure_publication <- Figure_publication_A +
#(UMAP_hoo + theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 10)) + labs(title="Predicted Asexual Cycle Real Timepoint")) +
#(UMAP_kasia + theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 10)) + labs(title="Predicted Gametocytogenesis Real Timepoint")) +
  ## marker genes
marker_gene_plot_FAMB +
marker_gene_plot_MSP8 +
marker_gene_plot_MSP1 +
marker_gene_plot_CDPK5 +
marker_gene_plot_AP2G +
marker_gene_plot_CCP2 +
marker_gene_plot_MG1 +
marker_gene_plot_ORC1 +
marker_gene_plot_MCM4 +
  ## mutant genes starts  
#marker_gene_plot_gd1 +
#marker_gene_plot_md1 +
#marker_gene_plot_md2 +
#marker_gene_plot_md3 +
#marker_gene_plot_md4 + 
#marker_gene_plot_md5 +
#marker_gene_plot_fd1 +
#marker_gene_plot_fd2 +
#marker_gene_plot_fd3 +
#marker_gene_plot_fd4 + 
plot_layout(ncol = 6, 
            widths = c(1, 1, 1, 1, 1 ,1),
            heights = c(2, 1, 1, 1),
            design = layout
            )

Figure_publication
```

save
```{r}
ggsave("../images_to_export/Figure_S2.png", plot = Figure_publication, device = "png", path = NULL, scale = 1, width = 21, height = 29.7, units = "cm", dpi = 300, limitsize = TRUE)
```

# Appendix {.tabset}

## Session Info 
```{r, echo = FALSE}
sessionInfo()
```

# Expression plots

#### Expression - raw - Viridis
```{r, fig.height = 15, fig.width = 15}
## find a good ring marker, to see if there is a better one than the ones reported
#markers_ring <- FindMarkers(tenx.justwt.integrated, ident.1 = c("4", "5", "16", "11", "7", "3", "9", "0", "22"))
#head(markers_ring)

# PBANKA-1319500 - CCP2 - female - used in 820 line
# PBANKA-0416100 - MG1 - dynenin heavy chain - male - used in 820 line
# PBANKA-1437500 - AP2G - commitment
# PBANKA-0831000 - MSP1 - late asexual
# PBANKA-1102200 - MSP8 - early asexual (from Bozdech paper)
# PBANKA-0711900 - HSP70 - promoter used for GFP and RFP expression in the mutants
# PBANKA-1400400 - FAMB - ring marker - discovered by looking for marker genes in data
# PBANKA-0722600 - Fam-b2 - ring marker - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5113031/ 


marker_gene_plot_CCP2 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-1319500", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste(italic(ccp2), "(Female)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))

marker_gene_plot_MG1 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-0416100", coord.fixed = TRUE, 
                                    #min.cutoff = "q1", 
                                    dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste(italic(mg1), "(Male)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))

marker_gene_plot_AP2G <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-1437500", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste(italic(ap2g), "(Commitment)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))

marker_gene_plot_MSP1 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-0831000", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste(italic(msp1), "(Schizont)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))

marker_gene_plot_MSP8 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-1102200", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste(italic(msp8), "(Asexual)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))

marker_gene_plot_SBP1 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-1101300", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste(italic(sbp1), "(Ring)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))

marker_gene_plot_FAMB <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-0722600", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste(italic(fam-b2),  "(Ring)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))

marker_gene_plot_HSP70 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-0711900", coord.fixed = TRUE, 
                                      #min.cutoff = "q1", 
                                      dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste(italic(HSP70),  "(Reporter)","\n", "PBANKA_0711900")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))

##original label:
# labs(title = paste("(CCP2; Female)","\n", "PBANKA_1319500"))

## plot
## cowplot method
marker_gene_plot_all <- plot_grid(marker_gene_plot_FAMB, marker_gene_plot_MSP8, marker_gene_plot_MSP1, marker_gene_plot_AP2G, marker_gene_plot_CCP2, marker_gene_plot_MG1, marker_gene_plot_HSP70, nrow=3)

marker_gene_plot_all

## patchwork method
#marker_gene_plot_FAMB + marker_gene_plot_MSP8 + marker_gene_plot_MSP1 + marker_gene_plot_AP2G + marker_gene_plot_CCP2 + marker_gene_plot_MG1 + marker_gene_plot_HSP70
```

#### Expression - raw - purple
```{r, fig.height = 15, fig.width = 15}

marker_gene_ramp <- colorRampPalette(c("#D3D3D3", "#1D1564"))(50)

marker_gene_plot_CCP2 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-1319500", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("CCP2 (Female)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=marker_gene_ramp)

marker_gene_plot_MG1 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-0416100", coord.fixed = TRUE, 
                                    #min.cutoff = "q1", 
                                    dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("MG1 (Male)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=marker_gene_ramp)

marker_gene_plot_AP2G <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-1437500", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("AP2G (Commitment)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=marker_gene_ramp)

marker_gene_plot_MSP1 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-0831000", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("MSP1 (Schizont)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=marker_gene_ramp)

marker_gene_plot_MSP8 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-1102200", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("MSP8 (Asexual)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=marker_gene_ramp)

marker_gene_plot_SBP1 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-1101300", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("SBP1 (Ring)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=marker_gene_ramp)

marker_gene_plot_FAMB <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-0722600", coord.fixed = TRUE, 
                                     #min.cutoff = "q1", 
                                     dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("Fam-b2 (Ring)")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=marker_gene_ramp)

marker_gene_plot_HSP70 <- FeaturePlot(tenx.justwt.integrated, features = "PBANKA-0711900", coord.fixed = TRUE, 
                                      #min.cutoff = "q1", 
                                      dims = c(1,2), reduction = "umap", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("(HSP70; Reporter)","\n", "PBANKA_0711900")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=marker_gene_ramp)

## plot
## cowplot method
marker_gene_plot_all <- plot_grid(marker_gene_plot_FAMB, marker_gene_plot_MSP8, marker_gene_plot_MSP1, marker_gene_plot_AP2G, marker_gene_plot_CCP2, marker_gene_plot_MG1, marker_gene_plot_HSP70, nrow=3)

marker_gene_plot_all
```

#### Expression - data - purple

```{r, fig.height = 10, fig.width = 10}
# PBANKA-1418100        GCSKO-17  FD3   
# PBANKA-0102400         GCSKO-2  MD3 
# PBANKA-0716500        GCSKO-19  MD4 
# PBANKA-1435200        GCSKO-20  FD4 
# PBANKA-0902300        GCSKO-13  FD2
# PBANKA-0413400    GCSKO-10_820  MD5
# PBANKA-0828000         GCSKO-3  GD1
# PBANKA-1302700       GCSKO-oom  MD1 
# PBANKA-1447900        GCSKO-29  MD2
# PBANKA-1454800        GCSKO-21  FD1
# PBANKA-1144800        GCSKO-28  FD5


marker_gene_plot_17 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-1418100", coord.fixed = TRUE,  pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("fd3")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_2 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-0102400", coord.fixed = TRUE,  pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("md3")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_19 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-0716500", coord.fixed = TRUE,  pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("md4")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_20 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-1435200", coord.fixed = TRUE,  pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("fd4")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_13 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-0902300", coord.fixed = TRUE,  pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("fd2")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_10 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-0413400", coord.fixed = TRUE,  pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("md5")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_3 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-0828000", coord.fixed = TRUE,  pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("gd1")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_oom <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-1302700", coord.fixed = TRUE,  pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("md1")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_29 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-1447900", coord.fixed = TRUE,  pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("md2")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

marker_gene_plot_21 <- FeaturePlot(tenx.justwt.integrated, dims = c(1,2), reduction = "umap", features = "PBANKA-1454800", coord.fixed = TRUE,  pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("fd1")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_color_continuous_sequential(palette = "Purples 2") +
  ## add sex symbols
  annotate("text", x = 3.8, y = 1.5, label = male_symbol, size=7, color="gray") + 
  annotate("text", x = 2, y = 2.8, label = female_symbol, size=7, color="gray")

##original label:
# labs(title = paste("(CCP2; Female)","\n", "PBANKA_1319500"))

## make composite plot
mutant_expression_composite <- wrap_plots(marker_gene_plot_17 , marker_gene_plot_2 , marker_gene_plot_19 , marker_gene_plot_20 , marker_gene_plot_13 , marker_gene_plot_10 , marker_gene_plot_3 , marker_gene_plot_oom , marker_gene_plot_29 , marker_gene_plot_21 , ncol = 4)
           
## print
mutant_expression_composite
```

# Density Plots

#### Density - purple

```{r}
# PBANKA-1319500 - CCP2 - female - used in 820 line
# PBANKA-0416100 - MG1 - dynenin heavy chain - male - used in 820 line
# PBANKA-1437500 - AP2G - commitment
# PBANKA-0831000 - MSP1 - late asexual
# PBANKA-1102200 - MSP8 - early asexual (from Bozdech paper)
# PBANKA-0711900 - HSP70 - promoter used for GFP and RFP expression in the mutants
# PBANKA-1400400 - FAMB - ring marker - discovered by looking for marker genes in data
# PBANKA-0722600 - Fam-b2 - ring marker - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5113031/ 

markers_list <- c("PBANKA-0828000", "PBANKA-1302700", "PBANKA-1447900", "PBANKA-0102400", "PBANKA-0716500", "PBANKA-0413400", "PBANKA-1454800", "PBANKA-0902300", "PBANKA-1418100", "PBANKA-1435200")

list_of_density_plots_mutant_genes <- plot_density(tenx.justwt.integrated, markers_list, joint = FALSE, combine = FALSE, dims = c(1,2), pal = "plasma", method = "ks")

## make composite plot
UMAP_composite_mutant_genes <- wrap_plots(list_of_density_plots_mutant_genes[[1]] + 
                                            coord_fixed() + 
                                            theme_void() + 
                                            labs(title = paste("gd1")) + 
                                            scale_colour_gradientn(colours=marker_gene_ramp) + 
                                            guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[2]] + 
                               coord_fixed() + theme_void() + 
                               labs(title = paste("md1")) + 
                               scale_colour_gradientn(colours=marker_gene_ramp) + 
                               guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[3]] + coord_fixed() + theme_void() + labs(title = paste("md2")) + scale_colour_gradientn(colours=marker_gene_ramp) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[4]] + coord_fixed() + theme_void() + labs(title = paste("md3")) + scale_colour_gradientn(colours=marker_gene_ramp) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[5]] + coord_fixed() + theme_void() + 
  labs(title = paste("md4")) + scale_colour_gradientn(colours=marker_gene_ramp) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[6]] + coord_fixed() + theme_void() + labs(title = paste("md5")) + scale_colour_gradientn(colours=marker_gene_ramp) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[7]] + coord_fixed() + theme_void() + labs(title = paste("fd1")) + scale_colour_gradientn(colours=marker_gene_ramp) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[8]] + coord_fixed() + theme_void() + labs(title = paste("fd2")) + scale_colour_gradientn(colours=marker_gene_ramp) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[9]] + coord_fixed() + theme_void() + labs(title = paste("fd3")) + scale_colour_gradientn(colours=marker_gene_ramp) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[10]] + coord_fixed() + theme_void() + labs(title = paste("fd4")) + scale_colour_gradientn(colours=marker_gene_ramp) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             ncol = 4)

UMAP_composite_mutant_genes
```


#### Density -viridis

```{r}
# PBANKA-1319500 - CCP2 - female - used in 820 line
# PBANKA-0416100 - MG1 - dynenin heavy chain - male - used in 820 line
# PBANKA-1437500 - AP2G - commitment
# PBANKA-0831000 - MSP1 - late asexual
# PBANKA-1102200 - MSP8 - early asexual (from Bozdech paper)
# PBANKA-0711900 - HSP70 - promoter used for GFP and RFP expression in the mutants
# PBANKA-1400400 - FAMB - ring marker - discovered by looking for marker genes in data
# PBANKA-0722600 - Fam-b2 - ring marker - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5113031/ 

markers_list <- c("PBANKA-0828000", "PBANKA-1302700", "PBANKA-1447900", "PBANKA-0102400", "PBANKA-0716500", "PBANKA-0413400", "PBANKA-1454800", "PBANKA-0902300", "PBANKA-1418100", "PBANKA-1435200")

list_of_density_plots_mutant_genes <- plot_density(tenx.justwt.integrated, markers_list, joint = FALSE, combine = FALSE, dims = c(1,2), pal = "plasma", method = "ks")

## make composite plot
UMAP_composite_mutant_genes <- wrap_plots(list_of_density_plots_mutant_genes[[1]] + 
                                            coord_fixed() + 
                                            theme_void() + 
                                            labs(title = paste("gd1")) + 
                                            scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + 
                                            guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[2]] + 
                               coord_fixed() + theme_void() + 
                               labs(title = paste("md1")) + 
                               scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + 
                               guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[3]] + coord_fixed() + theme_void() + labs(title = paste("md2")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[4]] + coord_fixed() + theme_void() + labs(title = paste("md3")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)) )+ guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[5]] + coord_fixed() + theme_void() + 
  labs(title = paste("md4")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[6]] + coord_fixed() + theme_void() + labs(title = paste("md5")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[7]] + coord_fixed() + theme_void() + labs(title = paste("fd1")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[8]] + coord_fixed() + theme_void() + labs(title = paste("fd2")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[9]] + coord_fixed() + theme_void() + labs(title = paste("fd3")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[10]] + coord_fixed() + theme_void() + labs(title = paste("fd4")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             ncol = 4)

UMAP_composite_mutant_genes
```

#### Density - viridis
```{r, fig.height = 10, fig.width = 10}
# PBANKA-0828000         GCSKO-3  GD1

# PBANKA-1302700       GCSKO-oom  MD1 
# PBANKA-1447900        GCSKO-29  MD2
# PBANKA-0102400         GCSKO-2  MD3 
# PBANKA-0716500        GCSKO-19  MD4 
# PBANKA-0413400    GCSKO-10_820  MD5

# PBANKA-1454800        GCSKO-21  FD1
# PBANKA-0902300        GCSKO-13  FD2
# PBANKA-1418100        GCSKO-17  FD3   
# PBANKA-1435200        GCSKO-20  FD4 

markers_list <- c("PBANKA-0828000", "PBANKA-1302700", "PBANKA-1447900", "PBANKA-0102400", "PBANKA-0716500", "PBANKA-0413400", "PBANKA-1454800", "PBANKA-0902300", "PBANKA-1418100", "PBANKA-1435200")

list_of_density_plots_mutant_genes <- plot_density(tenx.justwt.integrated, markers_list, joint = FALSE, combine = FALSE, dims = c(1,2), pal = "plasma", method = "ks")

## make composite plot
UMAP_composite_mutant_genes <- wrap_plots(list_of_density_plots_mutant_genes[[1]] + 
                                            coord_fixed() + 
                                            theme_void() + 
                                            labs(title = paste("gd1")) + 
                                            scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + 
                                            guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[2]] + 
                               coord_fixed() + theme_void() + 
                               labs(title = paste("md1")) + 
                               scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + 
                               guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[3]] + coord_fixed() + theme_void() + labs(title = paste("md2")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[4]] + coord_fixed() + theme_void() + labs(title = paste("md3")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)) )+ guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[5]] + coord_fixed() + theme_void() + 
  labs(title = paste("md4")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[6]] + coord_fixed() + theme_void() + labs(title = paste("md5")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[7]] + coord_fixed() + theme_void() + labs(title = paste("fd1")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[8]] + coord_fixed() + theme_void() + labs(title = paste("fd2")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[9]] + coord_fixed() + theme_void() + labs(title = paste("fd3")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[10]] + coord_fixed() + theme_void() + labs(title = paste("fd4")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             ncol = 4)

UMAP_composite_mutant_genes

## IF putting in MAIN - list_of_density_plots_mutant_genes[[2]] + coord_fixed() + theme_void() + theme(plot.title = element_text(hjust = 0.5), legend.text = element_text(size = 8), text=element_text(size=9)) + labs(title = paste("md1")) + scale_colour_gradientn(colours=c("#DCDCDC", plasma(30))) + guides(colour = guide_colourbar(barwidth = 0.3, barheight = 4.0, title = ""))
```

```{r, fig.height = 10, fig.width = 10}
# PBANKA-0828000         GCSKO-3  GD1

# PBANKA-1302700       GCSKO-oom  MD1 
# PBANKA-1447900        GCSKO-29  MD2
# PBANKA-0102400         GCSKO-2  MD3 
# PBANKA-0716500        GCSKO-19  MD4 
# PBANKA-0413400    GCSKO-10_820  MD5

# PBANKA-1454800        GCSKO-21  FD1
# PBANKA-0902300        GCSKO-13  FD2
# PBANKA-1418100        GCSKO-17  FD3   
# PBANKA-1435200        GCSKO-20  FD4 

markers_list <- c("PBANKA-0828000", "PBANKA-1302700", "PBANKA-1447900", "PBANKA-0102400", "PBANKA-0716500", "PBANKA-0413400", "PBANKA-1454800", "PBANKA-0902300", "PBANKA-1418100", "PBANKA-1435200")

list_of_density_plots_mutant_genes <- plot_density(tenx.justwt.integrated, markers_list, joint = FALSE, combine = FALSE, dims = c(1,2), pal = "plasma", method = "wkde", slot = 'data')

## make composite plot
UMAP_composite_mutant_genes <- wrap_plots(list_of_density_plots_mutant_genes[[1]] + 
                                            coord_fixed() + 
                                            theme_void() + 
                                            labs(title = paste("gd1")) + 
                                            scale_color_continuous_sequential(palette = "Purples 2") + 
                                            guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[2]] + 
                               coord_fixed() + theme_void() + 
                               labs(title = paste("md1")) + 
                               scale_color_continuous_sequential(palette = "Purples 2") + 
                               guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[3]] + 
                               coord_fixed() + 
                               theme_void() + 
                               labs(title = paste("md2")) + scale_color_continuous_sequential(palette = "Purples 2") + 
                               guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[4]] + coord_fixed() + theme_void() + labs(title = paste("md3")) + scale_color_continuous_sequential(palette = "Purples 2") + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[5]] + coord_fixed() + theme_void() + 
  labs(title = paste("md4")) + scale_color_continuous_sequential(palette = "Purples 2") + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[6]] + coord_fixed() + theme_void() + labs(title = paste("md5")) + scale_color_continuous_sequential(palette = "Purples 2") + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[7]] + coord_fixed() + theme_void() + labs(title = paste("fd1")) + scale_color_continuous_sequential(palette = "Purples 2") + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[8]] + coord_fixed() + theme_void() + labs(title = paste("fd2")) + scale_color_continuous_sequential(palette = "Purples 2") + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[9]] + coord_fixed() + theme_void() + labs(title = paste("fd3")) + scale_color_continuous_sequential(palette = "Purples 2") + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             list_of_density_plots_mutant_genes[[10]] + coord_fixed() + theme_void() + labs(title = paste("fd4")) + scale_color_continuous_sequential(palette = "Purples 2") + guides(colour = guide_colourbar(barwidth = 0.5, title = "")),
                             ncol = 4)

## old colour scale: scale_colour_gradientn(colours=marker_gene_ramp )

UMAP_composite_mutant_genes
```





